Batch Active Learning¶

apax.bal.api.compute_features(feature_fn: Callable[[FrozenDict, dict], Array], dataset: OTFInMemoryDataset) → ndarray[source]¶

Compute the features of a dataset.

apax.bal.api.feature_fn¶: Function to compute the features with.

apax.bal.api.dataset¶: Dataset to compute the features for.

apax.bal.api.create_feature_fn(model: EnergyModel, params: FrozenDict, base_feature_map: FeatureTransformation, feature_transforms=[], is_ensemble: bool = False)[source]¶

Converts a model into a feature map and transforms it as needed and sets it up for use in copmuting the features of a dataset.

All transformations are applied on the feature function, not on computed features. Only the final function is jit compiled.

apax.bal.api.model¶

Model to be transformed.

Type:: EnergyModel

apax.bal.api.params¶

Model parameters

Type:: FrozenDict

apax.bal.api.base_feature_map¶

Class that transforms the model into a FeatureMap

Type:: FeatureTransformation

apax.bal.api.is_ensemble¶

Whether or not to apply the ensemble transformation i.e. an averaging of kernels for model ensembles.

Type:: bool

apax.bal.api.kernel_selection(model_dir: Path | List[Path], train_atoms: List[Atoms], pool_atoms: List[Atoms], base_fm_options: dict, selection_method: str, feature_transforms: list = [], selection_batch_size: int = 10, processing_batch_size: int = 64) → list[int][source]¶

Main function to facilitate batch data selection. Currently only the last layer gradient features and MaxDist selection method are available. More can be added as needed as this function is agnostic of the feature map/selection method internals.

apax.bal.api.model_dir¶

Path to the trained model or models which should be used to compute features.

Type:: Union[Path, List[Path]]

apax.bal.api.train_atoms¶

List of ase.Atoms used to train the models.

Type:: List[Atoms]

apax.bal.api.pool_atoms¶

List of ase.Atoms to select new data from.

Type:: List[Atoms]

apax.bal.api.base_fm_options¶: Settings for the base feature map.

apax.bal.api.selection_method¶: Currently only “max_dist” is supported.

apax.bal.api.feature_transforms¶: Feature transforms to be applied on top of the base feature map transform. Examples would include multiplication with or addition of a constant.

apax.bal.api.selection_batch_size¶: Amount of new data points to be selected from pool_atoms.

apax.bal.api.processing_batch_size¶: Amount of data points to compute the features for at once. Does not effect results, just the speed of processing.

class apax.bal.feature_maps.FeatureTransformation[source]¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class apax.bal.feature_maps.FullGradientRPFeatures(*, name: Literal['full_grad_rp'] = 'full_grad_rp', num_rp: int = 512)[source]¶

Model transfomration which computes the gradient of the output wrt. all parameters and applies a gaussian random projection for dimensionality reduction. https://arxiv.org/pdf/2203.09410

Parameters:: num_rp (int) – Dimensionality to reduce the features to.

model_config = {'extra': 'forbid'}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class apax.bal.feature_maps.IdentityFeatures(*, name: Literal['identity'])[source]¶

Identity feature map. For debugging purposes

model_config = {'extra': 'forbid'}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class apax.bal.feature_maps.LastLayerForceFeatures(*, name: Literal['ll_force_feat'] = 'll_force_feat', layer_name: str = 'dense_2', strategy: str = 'raw')[source]¶

Model transformation which computes the jacobian of the forces wrt. the specified layer. For BAL the strategy “flatten” has to be selected.

Parameters:

layer_name (str) – Name of the layer wrt. which to take the jacobian.
strategy (str) – one of raw, sum, flatten. Only flatten seems to work for BAL. raw is required for LLPR.

model_config = {'extra': 'forbid'}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class apax.bal.feature_maps.LastLayerGradientFeatures(*, name: Literal['ll_grad'] = 'll_grad', layer_name: str = 'dense_2')[source]¶

Model transfomration which computes the gradient of the output wrt. the specified layer. https://arxiv.org/pdf/2203.09410

Parameters:: layer_name (str) – Name of the layer wrt. which to take the gradient.

model_config = {'extra': 'forbid'}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

apax.bal.feature_maps.extract_feature_params(params: dict, layer_name: str) → Tuple[dict, dict][source]¶: Separate params into those belonging to a selected layer and the remaining ones.

class apax.bal.kernel.KernelMatrix(g: ndarray, n_train: int)[source]¶

Matrix representation of a kernel defined by a feature map g K_{ij} = sum_{k} g_{ik} g_{jk}

score(idx: int) → ndarray[source]¶: Computes the distance of sample i from all other samples j as K_{ii} + K_{jj} - 2 K_{ij}

apax.bal.selection.max_dist_selection(matrix: KernelMatrix, batch_size: int | None = None)[source]¶

Iteratively selects samples from the pool which are most distant from all previously selected samples. argmax_{S in mathbb{X}_{rem}} min_{S’ in mathbb{X}_{sel} } d(S, S’)

https://arxiv.org/pdf/2203.09410.pdf https://doi.org/10.1039/D2DD00034B

apax.bal.selection.matrix¶

Kernel used to compare structures.

Type:: KernelMatrix

apax.bal.selection.batch_size¶

Number of new data points to be selected.

Type:: int

apax.bal.transforms.batch_features(feature_fn: Callable[[FrozenDict, dict], Array]) → Callable[[FrozenDict, dict], Array][source]¶: Vectorizes a feature map over structures. Should be the last transformation applied to a feature map.

apax.bal.transforms.ensemble_features(feature_fn: Callable[[FrozenDict, dict], Array]) → Callable[[FrozenDict, dict], Array][source]¶: Feature map transformation which averages the kernels of a model ensemble.