Batch Active Learning

apax.bal.api.compute_features(feature_fn: Callable[[FrozenDict, dict], Array], dataset: OTFInMemoryDataset) ndarray[source]

Compute the features of a dataset.

apax.bal.api.feature_fn

Function to compute the features with.

apax.bal.api.dataset

Dataset to compute the features for.

apax.bal.api.create_feature_fn(model: EnergyModel, params: FrozenDict, base_feature_map: FeatureTransformation, feature_transforms=[], is_ensemble: bool = False)[source]

Converts a model into a feature map and transforms it as needed and sets it up for use in copmuting the features of a dataset.

All transformations are applied on the feature function, not on computed features. Only the final function is jit compiled.

apax.bal.api.model

Model to be transformed.

Type:

EnergyModel

apax.bal.api.params

Model parameters

Type:

FrozenDict

apax.bal.api.base_feature_map

Class that transforms the model into a FeatureMap

Type:

FeatureTransformation

apax.bal.api.is_ensemble

Whether or not to apply the ensemble transformation i.e. an averaging of kernels for model ensembles.

Type:

bool

apax.bal.api.kernel_selection(model_dir: Path | List[Path], train_atoms: List[Atoms], pool_atoms: List[Atoms], base_fm_options: dict, selection_method: str, feature_transforms: list = [], selection_batch_size: int = 10, processing_batch_size: int = 64) list[int][source]

Main function to facilitate batch data selection. Currently only the last layer gradient features and MaxDist selection method are available. More can be added as needed as this function is agnostic of the feature map/selection method internals.

apax.bal.api.model_dir

Path to the trained model or models which should be used to compute features.

Type:

Union[Path, List[Path]]

apax.bal.api.train_atoms

List of ase.Atoms used to train the models.

Type:

List[Atoms]

apax.bal.api.pool_atoms

List of ase.Atoms to select new data from.

Type:

List[Atoms]

apax.bal.api.base_fm_options

Settings for the base feature map.

apax.bal.api.selection_method

Currently only “max_dist” is supported.

apax.bal.api.feature_transforms

Feature transforms to be applied on top of the base feature map transform. Examples would include multiplication with or addition of a constant.

apax.bal.api.selection_batch_size

Amount of new data points to be selected from pool_atoms.

apax.bal.api.processing_batch_size

Amount of data points to compute the features for at once. Does not effect results, just the speed of processing.

class apax.bal.feature_maps.FeatureTransformation[source]
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class apax.bal.feature_maps.FullGradientRPFeatures(*, name: Literal['full_grad_rp'] = 'full_grad_rp', num_rp: int = 512)[source]

Model transfomration which computes the gradient of the output wrt. all parameters and applies a gaussian random projection for dimensionality reduction. https://arxiv.org/pdf/2203.09410

Parameters:

num_rp (int) – Dimensionality to reduce the features to.

model_config = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class apax.bal.feature_maps.IdentityFeatures(*, name: Literal['identity'])[source]

Identity feature map. For debugging purposes

model_config = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class apax.bal.feature_maps.LastLayerForceFeatures(*, name: Literal['ll_force_feat'] = 'll_force_feat', layer_name: str = 'dense_2', strategy: str = 'raw')[source]

Model transformation which computes the jacobian of the forces wrt. the specified layer. For BAL the strategy “flatten” has to be selected.

Parameters:
  • layer_name (str) – Name of the layer wrt. which to take the jacobian.

  • strategy (str) – one of raw, sum, flatten. Only flatten seems to work for BAL. raw is required for LLPR.

model_config = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class apax.bal.feature_maps.LastLayerGradientFeatures(*, name: Literal['ll_grad'] = 'll_grad', layer_name: str = 'dense_2')[source]

Model transfomration which computes the gradient of the output wrt. the specified layer. https://arxiv.org/pdf/2203.09410

Parameters:

layer_name (str) – Name of the layer wrt. which to take the gradient.

model_config = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

apax.bal.feature_maps.extract_feature_params(params: dict, layer_name: str) Tuple[dict, dict][source]

Separate params into those belonging to a selected layer and the remaining ones.

class apax.bal.kernel.KernelMatrix(g: ndarray, n_train: int)[source]

Matrix representation of a kernel defined by a feature map g K_{ij} = sum_{k} g_{ik} g_{jk}

score(idx: int) ndarray[source]

Computes the distance of sample i from all other samples j as K_{ii} + K_{jj} - 2 K_{ij}

apax.bal.selection.max_dist_selection(matrix: KernelMatrix, batch_size: int | None = None)[source]

Iteratively selects samples from the pool which are most distant from all previously selected samples. argmax_{S in mathbb{X}_{rem}} min_{S’ in mathbb{X}_{sel} } d(S, S’)

https://arxiv.org/pdf/2203.09410.pdf https://doi.org/10.1039/D2DD00034B

apax.bal.selection.matrix

Kernel used to compare structures.

Type:

KernelMatrix

apax.bal.selection.batch_size

Number of new data points to be selected.

Type:

int

apax.bal.transforms.batch_features(feature_fn: Callable[[FrozenDict, dict], Array]) Callable[[FrozenDict, dict], Array][source]

Vectorizes a feature map over structures. Should be the last transformation applied to a feature map.

apax.bal.transforms.ensemble_features(feature_fn: Callable[[FrozenDict, dict], Array]) Callable[[FrozenDict, dict], Array][source]

Feature map transformation which averages the kernels of a model ensemble.