Machine learning force field calculations: Basics: Difference between revisions

From VASP Wiki
Line 58: Line 58:
{{NB|tip|Begin by thoroughly familiarizing yourself with pure ab initio calculations for your system before attempting to generate a machine-learned force field from scratch. Once you are confident in controlling the convergence, proceed to run a brief MD simulation without machine learning assistance. Validate whether the results align with expected values regarding conservation principles and so forth. Only then, move forward with the machine learning aspects of the calculation.}}
{{NB|tip|Begin by thoroughly familiarizing yourself with pure ab initio calculations for your system before attempting to generate a machine-learned force field from scratch. Once you are confident in controlling the convergence, proceed to run a brief MD simulation without machine learning assistance. Validate whether the results align with expected values regarding conservation principles and so forth. Only then, move forward with the machine learning aspects of the calculation.}}


== Parallelization and memory usage ==
== Parallelization ==


At present, {{VASP}} provides only MPI-based parallelization for the MLFF feature. Therefore, any operational mode relying exclusively on MLFF code - such as predictive MD simulations ({{TAG|ML_MODE}} = run) and local reference configuration selection ({{TAG|ML_MODE}} = select) - cannot leverage alternative forms of parallelization like [[OpenACC GPU port of VASP|OpenACC offloading to GPUs]] or [[Combining MPI and OpenMP|an MPI/OpenMP hybrid approach]]. Conversely, a usual on-the-fly training involves both MLFF generation and ab initio computations. When the latter component predominates in terms of computational demand, utilizing non-MPI parallelization remains practical.
At present, {{VASP}} provides only MPI-based parallelization for the MLFF feature. Therefore, any operational mode relying exclusively on MLFF code - such as predictive MD simulations ({{TAG|ML_MODE}} = run) and local reference configuration selection ({{TAG|ML_MODE}} = select) - cannot leverage alternative forms of parallelization like [[OpenACC GPU port of VASP|OpenACC offloading to GPUs]] or [[Combining MPI and OpenMP|an MPI/OpenMP hybrid approach]]. Conversely, a usual on-the-fly training involves both MLFF generation and ab initio computations. When the latter component predominates in terms of computational demand, utilizing non-MPI parallelization remains practical.

Revision as of 08:25, 5 May 2023

The machine-learned force fields (MLFF) feature of VASP allows you to generate, improve, modify and apply force fields based on machine learning techniques for your system of interest. Although there are many tunable parameters, i.e. MLFF-related INCAR tags, the default values have been carefully selected to simplify the initial creation of a machine-learned force field. Hence, we hope that only minimal additional effort is required to get started with this feature. Nevertheless, because machine learning involves multiple steps, e.g., at a minimum separate training and application stages, this page tries to explain the basic tags controlling the corresponding modes of operation. If you are already familiar with the basic usage of the MLFF feature, you may want to have a closer look at the best practices page which offers more in-depth advice for tuning MLFF settings. If you need more information about the theory and algorithms please visit the MLFF theory page.

Step-by-step instructions

The on-the-fly training VASP features is based on molecular dynamics simulations to sample training structures. Piece by piece a data set is automatically assembled and used to generate a machine-learned force field whenever feasible. Conversely, at each time step the current force field predicts energy, forces and the corresponding Bayesian error estimations. Simply put, if the error is above a certain threshold another ab initio calculation is performed and the reference energy and forces are added to the training data set. In the opposite case, the ab initio step is omitted and the system is propagated via MLFF predictions. As the force field gets better along the trajectory many ab initio steps can be avoided and the MD simulation is significantly accelerated. Ultimately, the on-the-fly training results in an MLFF which is ready for production, i.e., running an MD simulation in prediction-only mode. The following steps outline the path from start to production run:

Step 1: Prepare an MD run

Prepare an ab initio molecular dynamics run with your desired POSCAR starting configuration and an appropriate setup in INCAR, KPOINTS and POTCAR files.

Step 2: Start on-the-fly training from scratch

The machine-learned force field method can be configured with a lot of INCAR tags which are easily recognized from their prefix ML_. In general, to enable any MLFF feature the following INCAR tag needs to be set:

ML_LMLFF = .TRUE.

If this tag is not set to .TRUE. other MLFF-related INCAR tags are completely ignored and VASP will perform regular ab initio calculations. Furthermore, to start on-the-fly training we additionally need to set the ML_MODE "super"-tag:

ML_MODE = train

When executed in this train mode VASP will automatically perform ab initio calculation whenever necessary and otherwise rely on the predictions of the machine-learned force field. The usual output files, e.g., OUTCAR, XDATCAR, will be created along the MD trajectory. In addition, MLFF-related files will be written to disk, the most important ones being:

  • ML_LOGFILE The log file for all MLFF-related details; training status, current errors and other important quantities can be extracted from here.
  • ML_ABN This file contains the collected training structures and a list of selected local reference configurations.
  • ML_FFN A binary file containing the current machine-learned force field.

All three files are repeatedly updated during the MD simulation. After NSW time steps are carried out the ML_ABN and ML_FFN file contain the complete training data set and the final machine-learned force field, respectively. Training errors can be found in ML_LOGFILE by searching for lines starting with ERR.

Step 3 (optional): Continue on-the-fly training from existing training database

In principle, step 2 above may yield a force field ready for further processing and application. However, most of the time additional on-the-fly training iterations are necessary. For example, to extend the training database with structures at higher temperatures or different densities. Or, a force field is required to capture different atom type compositions or phases, e.g., a liquid and multiple solid phases. This can be achieved by on-the-fly continuation runs: at the beginning a force field is generated from the previous training data and - if applicable - used for predictions in the MD run. Like in step 2, the force field is trained along the trajectory. However, it also retains its applicability to the structures of the previous on-the-fly run. Finally, the continuation training will result in an MLFF capable of predicting structures of both runs. To continue on-the-fly training first set up your new starting POSCAR structure, e.g., by copying from the CONTCAR file. The new structure may share some atom types with the previous run but this is not a requirement. It is also possible to continue training with completely different atom types in the POSCAR file (remember to set up your POTCAR accordingly). The only other action required is to copy the existing database to the ML_AB file:

cp ML_ABN ML_AB

Leave ML_MODE = train unchanged and restart VASP. The log file will contain a section describing the existing data set and after initial generation of a force field the regular on-the-fly procedure continues. In the end, the resulting ML_ABN will contain the training structures from both on-the-fly runs. Similarly, the ML_FFN file is a combined force field. In the presence of an ML_AB file the train mode will always perform a continuation run. If you would like to start from scratch just remove the ML_AB file from the execution directory.

Tip: Apply this strategy repeatedly in order to systematically improve your machine-learned force field, e.g., first train on water only, then on sodium chloride and finally, train on the combination of both.

Step 4: Refit for fast prediction mode

When on-the-fly training succeeded and the result matches your expectations with respect to applicability and residual errors there is one final step required before the force field should be applied in prediction-only MD runs: refitting for fast prediction mode. Copy once again the final data set to ML_AB:

cp ML_ABN ML_AB

Also, set in the INCAR file:

ML_MODE = refit

Running VASP will create a new ML_FFN which finally can be used for production.

Important: Although it is technically possible to continue directly with step 5 given a ML_FFN file from steps 2 or 3 it is strongly discouraged. Without the refitting step VASP cannot enable the fast prediction mode which comes with speedup factor of approximately 20 to 100. You can check the ML_FFN ASCII header to be sure whether the contained force field supports fast prediction.

Step 5: Applying the MLFF in production runs

The machine-learned force field obtained from step 4 is now ready to be applied in the prediction-only mode. First, copy the ML_FFN file:

cp ML_FFN ML_FF

In the INCAR file set

ML_MODE = run

With this choice VASP will only use the predictions from the machine-learned force field, no ab initio calculations are performed. The execution time per time step will be orders of magnitude lower if compared with corresponding ab initio runs.

Tip: The MLFF can be transferred to larger system sizes, i.e., you may duplicate your simulation box to benefit from improved statistics. Because the method scales linearly with the number of atoms you can easily estimate the impact on computational demand.

Important general remarks

On-the-fly learning can be significantly more involved than, e.g., a single-point electronic calculation, because it combines multiple features of VASP. Each part requires a proper setup via the available INCAR tags. A misconfiguration corresponding to one part of the calculation may have severe effects on the quality of the resulting machine-learned force field. In the worst case, successful training may even be impossible. To be more specific, on-the-fly learning requires control over the following aspects:

  • Consistent convergence
It is required that all ab initio reference data collected via on-the-fly training is consistent and well-converged with respect to the single-point electronic calculation setup. Mind different temperatures and densities targeted in MD runs. A machine-learned force field can only reproduce a single potential energy landscape!
Consider the choice of thermodynamic ensembles, thermostat and barostat settings and an appropriate time step.
  • Proper setup of machine-learned force field parameters
Mind system-dependent parameters like the cutoff radius or atomic environment descriptor resolution.
  • Control over data set generation via on-the-fly learning
Monitor and control how much ab initio reference data is harvested via automatic Bayesian threshold determination and sparsification.
  • Quality control
Establish reasonable expectations regarding residual training errors. Benchmark the quality of resulting force fields by comparison of predictions with known quantities (from ab initio).
Tip: Begin by thoroughly familiarizing yourself with pure ab initio calculations for your system before attempting to generate a machine-learned force field from scratch. Once you are confident in controlling the convergence, proceed to run a brief MD simulation without machine learning assistance. Validate whether the results align with expected values regarding conservation principles and so forth. Only then, move forward with the machine learning aspects of the calculation.

Parallelization

At present, VASP provides only MPI-based parallelization for the MLFF feature. Therefore, any operational mode relying exclusively on MLFF code - such as predictive MD simulations (ML_MODE = run) and local reference configuration selection (ML_MODE = select) - cannot leverage alternative forms of parallelization like OpenACC offloading to GPUs or an MPI/OpenMP hybrid approach. Conversely, a usual on-the-fly training involves both MLFF generation and ab initio computations. When the latter component predominates in terms of computational demand, utilizing non-MPI parallelization remains practical.