OpenACC GPU port of VASP: Difference between revisions
Vaspmaster (talk | contribs) No edit summary |
Vaspmaster (talk | contribs) No edit summary |
||
Line 45: | Line 45: | ||
* Use a single MPI-rank per GPU (currently the use of NCCL precludes the use of multiple ranks per GPU). | * Use a single MPI-rank per GPU (currently the use of NCCL precludes the use of multiple ranks per GPU). | ||
* Use OpenMP-threads in addition to MPI-ranks to leverage more of the available CPU power. The OpenACC version is currently limited to the use of 1 MPI-rank/GPU, which means that potentially quite a bit of CPU power remains unused. Since there are still parts of the code that run CPU-side it can be beneficial to allow for the use of multiple OpenMP-threads per MPI-rank. | |||
== Credits == | == Credits == |
Revision as of 16:49, 10 February 2021
With VASP.6.2.0 we officially released the OpenACC GPU-port of VASP: Official in the sense that we now strongly recommend to use this OpenACC version to run VASP on GPU accelerated systems.
The previous CUDA-C GPU-port of VASP is considered to be deprecated and is no longer actively developed, maintained, or supported. In the near future, the CUDA-C GPU-port of VASP will be dropped completely.
Requirements
Software stack
Compiler
- To compile the OpenACC version of VASP you need either the NVIDIA HPC-SDK or a recent version (>=19.10) of PGI's Compilers & Tools.
- In principle any compiler that supports at least OpenACC standard 2.6 should do the trick, but we have tried and tested the aforementioned ones.
Libraries
- When compiling with PGI Compilers & Tools: the QD (software emulated quadruple precision arithmetic) and NCCL (>=2.7.8) libraries. (Conveniently, these libraries are part of the NVIDIA HPC-SDK.)
- An installation of NVIDIA's CUDA Toolkit (>= 10.0): the necessary parts are already bundled into the NVIDIA HPC-SDK and PGI's Compilers & Tools, so there is no need to separately install the CUDA Toolkit if you use either of the latter compiler suites.
- A CUDA-aware version of MPI: the OpenMPI installations that ship with the NVIDIA HPC-SDK and PGI's Compilers & Tools are CUDA-aware.
Drivers
- You need a CUDA driver that supports at least CUDA-10.0 (see above).
Hardware
We have only tested the OpenACC GPU-port of VASP with the following NVIDIA GPUs:
- NVIDIA datacenter GPUs: P100 (Pascal), V100 (Volta), and A100 (Ampere).
- NVIDIA Quadro GPUs: GP100 (Pascal), and GV100 (Volta).
N.B.: Running VASP on other NVIDIA GPUs (e.g. "gaming" hardware) is technically possible but not advisable: these GPUs are not well suited since they do not offer fast double precision floating point arithmetic (FP64) performance and in general have smaller memories without error correction code (ECC) capabilities.
Features and limitations
- Most features of VASP have been ported to GPU using OpenACC, with the notable exception of everything involving the RPA: GW and ACFDT. This is work in progress.
- The use of parallel FFTs of the wave functions (NCORE>1) should be avoided for performance reasons. Currently the OpenACC version will automatically switch to NCORE=1 even if otherwise specified in the INCAR file.
- Due to the use of NCCL, the OpenACC version of VASP may only be executed using a single MPI-rank per available GPU:
- Using NCCL has large performance benefits in the majority of cases. However, we are aware of the fact that for calculations on small systems it would be useful to retain the ability of having multiple MPI-ranks share a GPU, and plan the make the use of NCCL optional to remove this limitation.
Running the OpenACC version
- Use a single MPI-rank per GPU (currently the use of NCCL precludes the use of multiple ranks per GPU).
- Use OpenMP-threads in addition to MPI-ranks to leverage more of the available CPU power. The OpenACC version is currently limited to the use of 1 MPI-rank/GPU, which means that potentially quite a bit of CPU power remains unused. Since there are still parts of the code that run CPU-side it can be beneficial to allow for the use of multiple OpenMP-threads per MPI-rank.