Batch Active Learning#

apax.bal.api.compute_features(feature_fn: Callable[[FrozenDict, dict], Array], dataset: OTFInMemoryDataset) → ndarray[source]#

Compute the features of a dataset.

apax.bal.api.feature_fn#: Function to compute the features with.

apax.bal.api.dataset#: Dataset to compute the features for.

apax.bal.api.create_feature_fn(model: EnergyModel, params: FrozenDict, base_feature_map: FeatureTransformation, feature_transforms=[], is_ensemble: bool = False)[source]#

Converts a model into a feature map and transforms it as needed and sets it up for use in copmuting the features of a dataset.

All transformations are applied on the feature function, not on computed features. Only the final function is jit compiled.

apax.bal.api.model#

Model to be transformed.

Type:: EnergyModel

apax.bal.api.params#

Model parameters

Type:: FrozenDict

apax.bal.api.base_feature_map#

Class that transforms the model into a FeatureMap

Type:: FeatureTransformation

apax.bal.api.is_ensemble#

Whether or not to apply the ensemble transformation i.e. an averaging of kernels for model ensembles.

Type:: bool

apax.bal.api.kernel_selection(model_dir: Path | List[Path], train_atoms: List[Atoms], pool_atoms: List[Atoms], base_fm_options: dict, selection_method: str, feature_transforms: list = [], selection_batch_size: int = 10, processing_batch_size: int = 64) → list[int][source]#

Main function to facilitate batch data selection. Currently only the last layer gradient features and MaxDist selection method are available. More can be added as needed as this function is agnostic of the feature map/selection method internals.

apax.bal.api.model_dir#

Path to the trained model or models which should be used to compute features.

Type:: Union[Path, List[Path]]

apax.bal.api.train_atoms#

List of ase.Atoms used to train the models.

Type:: List[Atoms]

apax.bal.api.pool_atoms#

List of ase.Atoms to select new data from.

Type:: List[Atoms]

apax.bal.api.base_fm_options#: Settings for the base feature map.

apax.bal.api.selection_method#: Currently only “max_dist” is supported.

apax.bal.api.feature_transforms#: Feature tranforms to be applied on top of the base feature map transform. Examples would include multiplcation with or addition of a constant.

apax.bal.api.selection_batch_size#: Amount of new data points to be selected from pool_atoms.

apax.bal.api.processing_batch_size#: Amount of data points to compute the features for at once. Does not effect results, just the speed of processing.

class apax.bal.feature_maps.FeatureTransformation[source]#

model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}#: A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {}#

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

class apax.bal.feature_maps.IdentityFeatures(*, name: Literal['identity'])[source]#

Identity feature map. For debugging purposes

model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}#: A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'name': FieldInfo(annotation=Literal['identity'], required=True)}#

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

class apax.bal.feature_maps.LastLayerGradientFeatures(*, name: Literal['ll_grad'], layer_name: str = 'dense_2')[source]#

Model transfomration which computes the gradient of the output wrt. the specified layer. https://arxiv.org/pdf/2203.09410

model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}#: A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'layer_name': FieldInfo(annotation=str, required=False, default='dense_2'), 'name': FieldInfo(annotation=Literal['ll_grad'], required=True)}#

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

apax.bal.feature_maps.extract_feature_params(params: dict, layer_name: str) → Tuple[dict, dict][source]#: Separate params into those belonging to a selected layer and the remaining ones.

class apax.bal.kernel.KernelMatrix(g: ndarray, n_train: int)[source]#

Matrix representation of a kernel defined by a feature map g K_{ij} = sum_{k} g_{ik} g_{jk}

score(idx: int) → ndarray[source]#: Computes the distance of sample i from all other samples j as K_{ii} + K_{jj} - 2 K_{ij}

apax.bal.selection.max_dist_selection(matrix: KernelMatrix, batch_size: int)[source]#

Iteratively selects samples from the pool which are most distant from all previously selected samples. argmax_{S in mathbb{X}_{rem}} min_{S’ in mathbb{X}_{sel} } d(S, S’)

https://arxiv.org/pdf/2203.09410.pdf https://doi.org/10.1039/D2DD00034B

apax.bal.selection.matrix#

Kernel used to compare structures.

Type:: KernelMatrix

apax.bal.selection.batch_size#

Number of new data points to be selected.

Type:: int

apax.bal.transforms.batch_features(feature_fn: Callable[[FrozenDict, dict], Array]) → Callable[[FrozenDict, dict], Array][source]#: Vectorizes a feature map over structures. Should be the last transformation applied to a feature map.

apax.bal.transforms.ensemble_features(feature_fn: Callable[[FrozenDict, dict], Array]) → Callable[[FrozenDict, dict], Array][source]#: Feature map transformation which averages the kernels of a model ensemble.