Batch Active Learning¶
- apax.bal.api.compute_features(feature_fn: Callable[[FrozenDict, dict], Array], dataset: OTFInMemoryDataset) ndarray[source]¶
Compute the features of a dataset.
- apax.bal.api.feature_fn¶
Function to compute the features with.
- apax.bal.api.dataset¶
Dataset to compute the features for.
- apax.bal.api.create_feature_fn(model: EnergyModel, params: FrozenDict, base_feature_map: FeatureTransformation, feature_transforms=[], is_ensemble: bool = False)[source]¶
Converts a model into a feature map and transforms it as needed and sets it up for use in copmuting the features of a dataset.
All transformations are applied on the feature function, not on computed features. Only the final function is jit compiled.
- apax.bal.api.model¶
Model to be transformed.
- Type:
- apax.bal.api.params¶
Model parameters
- Type:
FrozenDict
- apax.bal.api.base_feature_map¶
Class that transforms the model into a FeatureMap
- Type:
- apax.bal.api.is_ensemble¶
Whether or not to apply the ensemble transformation i.e. an averaging of kernels for model ensembles.
- Type:
bool
- apax.bal.api.kernel_selection(model_dir: Path | List[Path], train_atoms: List[Atoms], pool_atoms: List[Atoms], base_fm_options: dict, selection_method: str, feature_transforms: list = [], selection_batch_size: int = 10, processing_batch_size: int = 64) list[int][source]¶
Main function to facilitate batch data selection. Currently only the last layer gradient features and MaxDist selection method are available. More can be added as needed as this function is agnostic of the feature map/selection method internals.
- apax.bal.api.model_dir¶
Path to the trained model or models which should be used to compute features.
- Type:
Union[Path, List[Path]]
- apax.bal.api.train_atoms¶
List of ase.Atoms used to train the models.
- Type:
List[Atoms]
- apax.bal.api.pool_atoms¶
List of ase.Atoms to select new data from.
- Type:
List[Atoms]
- apax.bal.api.base_fm_options¶
Settings for the base feature map.
- apax.bal.api.selection_method¶
Currently only “max_dist” is supported.
- apax.bal.api.feature_transforms¶
Feature transforms to be applied on top of the base feature map transform. Examples would include multiplication with or addition of a constant.
- apax.bal.api.selection_batch_size¶
Amount of new data points to be selected from pool_atoms.
- apax.bal.api.processing_batch_size¶
Amount of data points to compute the features for at once. Does not effect results, just the speed of processing.
- class apax.bal.feature_maps.FeatureTransformation[source]¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class apax.bal.feature_maps.FullGradientRPFeatures(*, name: Literal['full_grad_rp'] = 'full_grad_rp', num_rp: int = 512)[source]¶
Model transfomration which computes the gradient of the output wrt. all parameters and applies a gaussian random projection for dimensionality reduction. https://arxiv.org/pdf/2203.09410
- Parameters:
num_rp (int) – Dimensionality to reduce the features to.
- model_config = {'extra': 'forbid'}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class apax.bal.feature_maps.IdentityFeatures(*, name: Literal['identity'])[source]¶
Identity feature map. For debugging purposes
- model_config = {'extra': 'forbid'}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class apax.bal.feature_maps.LastLayerForceFeatures(*, name: Literal['ll_force_feat'] = 'll_force_feat', layer_name: str = 'dense_2', strategy: str = 'raw')[source]¶
Model transformation which computes the jacobian of the forces wrt. the specified layer. For BAL the strategy “flatten” has to be selected.
- Parameters:
layer_name (str) – Name of the layer wrt. which to take the jacobian.
strategy (str) – one of raw, sum, flatten. Only flatten seems to work for BAL. raw is required for LLPR.
- model_config = {'extra': 'forbid'}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class apax.bal.feature_maps.LastLayerGradientFeatures(*, name: Literal['ll_grad'] = 'll_grad', layer_name: str = 'dense_2')[source]¶
Model transfomration which computes the gradient of the output wrt. the specified layer. https://arxiv.org/pdf/2203.09410
- Parameters:
layer_name (str) – Name of the layer wrt. which to take the gradient.
- model_config = {'extra': 'forbid'}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- apax.bal.feature_maps.extract_feature_params(params: dict, layer_name: str) Tuple[dict, dict][source]¶
Separate params into those belonging to a selected layer and the remaining ones.
- class apax.bal.kernel.KernelMatrix(g: ndarray, n_train: int)[source]¶
Matrix representation of a kernel defined by a feature map g K_{ij} = sum_{k} g_{ik} g_{jk}
- apax.bal.selection.max_dist_selection(matrix: KernelMatrix, batch_size: int | None = None)[source]¶
Iteratively selects samples from the pool which are most distant from all previously selected samples. argmax_{S in mathbb{X}_{rem}} min_{S’ in mathbb{X}_{sel} } d(S, S’)
https://arxiv.org/pdf/2203.09410.pdf https://doi.org/10.1039/D2DD00034B
- apax.bal.selection.matrix¶
Kernel used to compare structures.
- Type:
- apax.bal.selection.batch_size¶
Number of new data points to be selected.
- Type:
int