ML_IWEIGHT

From VASP Wiki

ML_IWEIGHT = [integer]
Default: ML_IWEIGHT = 3 

Description: This tag controls which procedure is used for normalizing and weighting the energies, forces, and stresses in the machine learning force field method.


To achieve optimal training it is important to normalize the available data. Furthermore, sometimes it may be desired to emphasize some training quantities over others, e.g. one might want excellent force predictions, even at the cost of sacrificing some energy and stress accuracy. How normalizing and weighting are performed can be controlled with the ML_IWEIGHT together with weighting parameters ML_WTOTEN, ML_WTIFOR and ML_WTSIF for energies, forces, and stresses, respectively. The following procedures can be selected via ML_IWEIGHT:

  • ML_IWEIGHT = 1: Manual control over normalization/weighting: the unnormalized energies, forces, and stress tensor training data are divided by the weights determined by the flags ML_WTOTEN (eV/atom), ML_WTIFOR (eV/) and ML_WTSIF (kBar), respectively.
  • ML_IWEIGHT = 2: Normalization via global standard deviations: The energies, forces, and stresses are normalized by their respective standard deviation over the entire training data. Then, the normalized quantities are weighted by ML_WTOTEN, ML_WTIFOR and ML_WTSIF when they are processed for learning in the design matrix (see this section). In this case the values of ML_WTOTEN, ML_WTIFOR and ML_WTSIF are unitless quantities.
  • ML_IWEIGHT = 3: Normalization via averages over subset standard deviations: Same as ML_IWEIGHT = 2 but the training data is divided into individual subsets. For each subset, the standard deviations are calculated separately. Then, the energies, forces, and stresses are normalized using the average of the standard deviations of all subsets. Finally, as for ML_IWEIGHT = 2 the normalized quantities are multiplied by ML_WTOTEN, ML_WTIFOR and ML_WTSIF for learning purposes. By default (ML_LUSE_NAMES=.FALSE.) the division into subsets is based on the atom types and number of atoms per type. If two systems contain the same atom types and the same number of atoms per type then they are considered to be in the same subset. To further divide them into subsets set ML_LUSE_NAMES=.TRUE. and choose different system names in the first line of the POSCAR file. This can be useful if training is performed for widely different materials, for instance, different phases with widely different energies. Without the finer subset assignment, the overall energy standard deviation might become large, reducing the weight of the energies too much of given subsets.

For ML_IWEIGHT = 2, 3 the weights are unitless quantities used to multiply the data, whereas for ML_IWEIGHT = 1 they have a unit. All three methods provide unitless energies, forces, and stress tensors, which are then passed to the learning algorithm. Although the defaults are usually rather sensible, it can be useful to explore different weights. For instance, if vibrational frequencies are supposed to be reproduced accurately, we found it helpful to increase ML_WTIFOR to 10-100. On the other hand, if the energy difference between different phases needs to be described accurately by the force field, it might be useful to increase ML_WTOTEN to around 10-100.

Tip: On-the-fly learning implies that training structures accumulate along the running MD trajectory. Hence, also the standard deviations of energies, forces, and stresses change over time and will be recalculated whenever a learning step is triggered. We highly recommend using ML_IWEIGHT = 3 because this ensures that at any time learning is performed on an adequately normalized set.

Related tags and articles

ML_LMLFF, ML_WTOTEN, ML_WTIFOR, ML_WTSIF, ML_LUSE_NAMES

Examples that use this tag