Optimizing the parallelization: Difference between revisions

From VASP Wiki
No edit summary
No edit summary
Line 5: Line 5:
==Optimizing the parallelization==
==Optimizing the parallelization==


Basically, a few steps of any repetitive tasks give a good estimate of the performance for the full calculation.
Basically, for any repetitive task, '''a few iteration steps give a good estimate of the performance''' for the full calculation.
For example, run only a few electronic or ionic self-consistency steps (without reaching full convergence) and compare various setups for the parallelization.  
For example, run only a few electronic or ionic self-consistency steps (without reaching full convergence) and compare various setups for the parallelization.  


Try to get as close as possible to the actual system. This includes both the physical system (atoms, cell size, cutoff, ...) as well as the computational hardware (CPUs, interconnect, number of nodes, ...).
Try to '''get as close as possible to the actual system'''. This includes both the physical system (atoms, cell size, cutoff, ...) as well as the computational hardware (CPUs, interconnect, number of nodes, ...).
If too many parameters are different, the parallel configuration may not be transferable to the production calculation.
If too many parameters are different, the parallel configuration may not be transferable to the production calculation.


It can be useful to first run VASP with {{TAG|ALGO}}=''None'' set in the {{FILE|INCAR}} file. That is because the [[:Category:Calculation setup|computational setup]] for the [[:Category:Electronic minimization|electronic minimization]] is done without actually performing the minimization. For instance, the FFTs are planned, and the irreducible '''k''' points of the first Brillouin zone are constructed. Therefore, some parameters, e.g., the default number of Kohn-Sham orbitals ({{TAG|NBANDS}}) and the total number of plane waves, are written to the {{FILE|OUTCAR}} file while using barely any computational time.  
For any calculation involving [[:Category:Electronic minimization|electronic minimization]], it can be useful to first '''run VASP with {{TAG|ALGO}}=''None''''' set in the {{FILE|INCAR}} file. That is because with {{TAG|ALGO}}=''None'' the [[:Category:Calculation setup|computational setup]] for the [[:Category:Electronic minimization|electronic minimization]] is done without actually performing the minimization. For instance, the FFTs are planned, and the irreducible '''k''' points of the first Brillouin zone are constructed. Therefore, some parameters, e.g., the default number of Kohn-Sham orbitals ({{TAG|NBANDS}}) and the total number of plane waves, are written to the {{FILE|OUTCAR}} file while using barely any computational time.  


Often, VASP yields the best performance by combining multiple parallelization options. This is because the parallel efficiency of each level drops near its limit.
In our experience, VASP yields the best performance by '''combining multiple parallelization options'''. This is because the parallel efficiency of each level drops near its limit.
By default, VASP distributes the number of bands ({{TAG|NBANDS}}) over the available MPI ranks. But it is often beneficial to add parallelization of the FFTs ({{TAG|NCORE}}), parallelization over '''k''' points ({{TAG|KPAR}}), and parallelization over separate calculations ({{TAG|IMAGES}}).  Additionally, there are some parallelization options for specific algorithms in VASP, e.g., {{TAG|NOMEGAPAR}} for parallelization over imaginary frequency points in <math>GW</math> and RPA calculations. In summary, VASP parallelizes with
By default, VASP distributes the number of bands ({{TAG|NBANDS}}) over the available MPI ranks. But it is often beneficial to add parallelization of the FFTs ({{TAG|NCORE}}), parallelization over '''k''' points ({{TAG|KPAR}}), and parallelization over separate calculations ({{TAG|IMAGES}}).  Additionally, there are some parallelization options for specific algorithms in VASP, e.g., {{TAG|NOMEGAPAR}} for parallelization over imaginary frequency points in <math>GW</math> and RPA calculations. In summary, '''VASP parallelizes with'''


::<math>
::<math>
Line 20: Line 20:
</math>
</math>


First, create a list of the relevant [[Category:Parallelization|INCAR tags for parallelization]] for the specific calculation. Then, the documentation for each of the relevant [[Category:Parallelization|INCAR tags for parallelization]] should help to find reasonable settings to run test calculations.  
First, '''create a list of the relevant [[Category:Parallelization|INCAR tags for parallelization]]''' for the specific calculation. Then, '''read the documentation for each of the relevant tags''' to find reasonable settings to '''run test calculations'''.  


For example, in the case of [[:Category:Electronic minimization|electronic minimization]], aim to set the number of ranks to the default value of {{TAG|NBANDS}} divided by a small integer. Note that VASP will increase {{TAG|NBANDS}} to match the number of ranks. Choose {{TAG|NCORE}} as a factor of the cores per node to avoid communicating between nodes for the FFTs, but mind that {{TAG|NCORE}} cannot be set with [[Combining MPI and OpenMP|OpenMP]] threading and/or the [[OpenACC GPU port of VASP|OpenACC GPU port]].
For example, in the case of [[:Category:Electronic minimization|electronic minimization]], aim to set the number of ranks to the default value of {{TAG|NBANDS}} divided by a small integer. Note that VASP will increase {{TAG|NBANDS}} to accomodate the number of ranks. Choose {{TAG|NCORE}} as a factor of the cores per node to avoid communicating between nodes for the FFTs, but mind that {{TAG|NCORE}} cannot be set with [[Combining MPI and OpenMP|OpenMP]] threading and/or the [[OpenACC GPU port of VASP|OpenACC GPU port]].
The '''k'''-point parallelization is efficient but requires additional [[:Category:Memory|memory]].
The '''k'''-point parallelization is efficient but requires additional [[:Category:Memory|memory]].
Given sufficient [[:Category:Memory|memory]], increase {{TAG|KPAR}} up to the number of irreducible '''k''' points.
Given sufficient [[:Category:Memory|memory]], increase {{TAG|KPAR}} up to the number of irreducible '''k''' points.

Revision as of 12:20, 12 April 2022

To find the optimal parallelization setup of a VASP calculation, it is necessary to run tests for each system, algorithm and computer architecture. Below, we offer general advice on how to optimize the parallelization.

Optimizing the parallelization

Basically, for any repetitive task, a few iteration steps give a good estimate of the performance for the full calculation. For example, run only a few electronic or ionic self-consistency steps (without reaching full convergence) and compare various setups for the parallelization.

Try to get as close as possible to the actual system. This includes both the physical system (atoms, cell size, cutoff, ...) as well as the computational hardware (CPUs, interconnect, number of nodes, ...). If too many parameters are different, the parallel configuration may not be transferable to the production calculation.

For any calculation involving electronic minimization, it can be useful to first run VASP with ALGO=None set in the INCAR file. That is because with ALGO=None the computational setup for the electronic minimization is done without actually performing the minimization. For instance, the FFTs are planned, and the irreducible k points of the first Brillouin zone are constructed. Therefore, some parameters, e.g., the default number of Kohn-Sham orbitals (NBANDS) and the total number of plane waves, are written to the OUTCAR file while using barely any computational time.

In our experience, VASP yields the best performance by combining multiple parallelization options. This is because the parallel efficiency of each level drops near its limit. By default, VASP distributes the number of bands (NBANDS) over the available MPI ranks. But it is often beneficial to add parallelization of the FFTs (NCORE), parallelization over k points (KPAR), and parallelization over separate calculations (IMAGES). Additionally, there are some parallelization options for specific algorithms in VASP, e.g., NOMEGAPAR for parallelization over imaginary frequency points in and RPA calculations. In summary, VASP parallelizes with

First, create a list of the relevant for the specific calculation. Then, read the documentation for each of the relevant tags to find reasonable settings to run test calculations.

For example, in the case of electronic minimization, aim to set the number of ranks to the default value of NBANDS divided by a small integer. Note that VASP will increase NBANDS to accomodate the number of ranks. Choose NCORE as a factor of the cores per node to avoid communicating between nodes for the FFTs, but mind that NCORE cannot be set with OpenMP threading and/or the OpenACC GPU port. The k-point parallelization is efficient but requires additional memory. Given sufficient memory, increase KPAR up to the number of irreducible k points. Keep in mind that KPAR should factorize the number of k points. Finally, use the IMAGES tag to split several VASP runs into separate calculations. The limit is dictated by the number of desired calculations.

Related tags an articles

Parallelization , KPAR, NCORE, KPAR, IMAGES

References