Parallelization: Difference between revisions

From VASP Wiki
(Created page with "For many complex problems, a single core is not enough to finish the calculation in a reasonable time. VASP makes use of parallel machines splitting the calculation into many...")
 
No edit summary
Line 4: Line 4:
But it is often beneficial to add parallelization over the FFTs ({{TAG|NCORE}}), the '''k''' points ({{TAG|KPAR}}), and separate calculations ({{TAG|IMAGES}}).
But it is often beneficial to add parallelization over the FFTs ({{TAG|NCORE}}), the '''k''' points ({{TAG|KPAR}}), and separate calculations ({{TAG|IMAGES}}).
All these tags default to 1 and divide the number of cores among the parallelization options.
All these tags default to 1 and divide the number of cores among the parallelization options.
There may also be additional parallelization options for some algorithms in VASP.
There are also additional parallelization options for some algorithms in VASP.
::<math>
::<math>
\text{total cores} = \text{cores parallelizing bands} \times \text{NCORE} \times \text{KPAR} \times \text{IMAGES} \times \text{other algorithm-dependent tags}
\text{total cores} = \text{cores parallelizing bands} \times \text{NCORE} \times \text{KPAR} \times \text{IMAGES} \times \text{other algorithm-dependent tags}
Line 12: Line 12:


==Optimizing the parallelization==
==Optimizing the parallelization==
{{NB|tip|We offer only general advice here. The performance for specific systems may be significantly different. However, in many cases, one is interested in similar calculations. Then run a few of these cases varying the parallel setup and use the optimal choice of parameters for the rest.}}
When choosing the optimal performance try to get as close as possible to the actual system.
This includes both the physical system (atoms, cell size, cutoff, ...) as well as the computational hardware (CPUs, interconnect, number of nodes, ...).
If too many parameters are different, the parallel configuration may not be transferable to the production calculation.
Nevertheless, a few steps of repetitive tasks give a good idea of an optimal choice for the full calculation.
For example, running only a few electronic or ionic self-consistency steps instead of finishing the convergence.
Often, combining multiple parallelization options yields the fastest results because the parallel efficiency of each level drops near its limit.
For the default option (band parallelization), the limit is {{TAG|NBANDS}} divided by a small integer.
Note that VASP will increase {{TAG|NBANDS}} to match the number of cores.
Choose {{TAG|NCORE}} as a factor of the cores per node to avoid communicating between nodes for the FFTs.
The '''k'''-point parallelization is efficient but requires additional memory.
Given sufficient memory, increase {{TAG|KPAR}} up to the number of irreducible '''k''' points.
Keep in mind that {{TAG|KPAR}} should factorize the number of '''k''' points.
Finally, {{TAG|IMAGES}} is required to split several VASP runs into separate calculations.
The limit is dictated by the number of desired calculations.
==Caveat about the MPI setup==
==Additional parallelization options==
==OpenMP/OpenACC==

Revision as of 12:38, 6 April 2022

For many complex problems, a single core is not enough to finish the calculation in a reasonable time. VASP makes use of parallel machines splitting the calculation into many tasks. By default, VASP distributes the number of bands (NBANDS) over the available cores. But it is often beneficial to add parallelization over the FFTs (NCORE), the k points (KPAR), and separate calculations (IMAGES). All these tags default to 1 and divide the number of cores among the parallelization options. There are also additional parallelization options for some algorithms in VASP.

VASP makes use of OpenMP and OpenACC when possible. Note that these options conflict with the NCORE parallelization.

Optimizing the parallelization

Tip: We offer only general advice here. The performance for specific systems may be significantly different. However, in many cases, one is interested in similar calculations. Then run a few of these cases varying the parallel setup and use the optimal choice of parameters for the rest.

When choosing the optimal performance try to get as close as possible to the actual system. This includes both the physical system (atoms, cell size, cutoff, ...) as well as the computational hardware (CPUs, interconnect, number of nodes, ...). If too many parameters are different, the parallel configuration may not be transferable to the production calculation. Nevertheless, a few steps of repetitive tasks give a good idea of an optimal choice for the full calculation. For example, running only a few electronic or ionic self-consistency steps instead of finishing the convergence.

Often, combining multiple parallelization options yields the fastest results because the parallel efficiency of each level drops near its limit. For the default option (band parallelization), the limit is NBANDS divided by a small integer. Note that VASP will increase NBANDS to match the number of cores. Choose NCORE as a factor of the cores per node to avoid communicating between nodes for the FFTs. The k-point parallelization is efficient but requires additional memory. Given sufficient memory, increase KPAR up to the number of irreducible k points. Keep in mind that KPAR should factorize the number of k points. Finally, IMAGES is required to split several VASP runs into separate calculations. The limit is dictated by the number of desired calculations.

Caveat about the MPI setup

Additional parallelization options

OpenMP/OpenACC