ORCUS (CEA-ISAS-DM2S): compilation on single- and multi-GPU - doc updated August 28th, 2025

On Orcus, several GPU partitions are available. Here we give an example for compiling and running a job on partition gpuq_h100 equipped with 4 GPU H100. For cmake the flag -DKokkos_ARCH_HOPPER90=ON will be used for H100. The test is also performed with problem NSAC_Comp.

1. Connexion on Orcus

Connexion on ORCUS

loginname@is247529:~$ ssh -XC orcusloginamd2

2. Create the makefile with a shell script

Several scripts are available in the folder compilation of LBM_Saclay to create the makefile for your own desktop (local) and supercomputers topaze and orcus. For supercomputer orcus one file configure_build.sh is set inside all subdirectory for each GPU V100, A100 and H100.

script

To create the makefile for GPU H100 with MPI, simply execute the shell script configure_build.sh:

$ cd LBM_Saclay_Rech-Dev
$ ./compilation/orcus/cuda_h100_multigpu/configure_build.sh

returns

Loading hdf5/nvhpc_25.3_openmpi_5.0.3/1.14.3
Loading requirement: gcccore/.14.2.0 cuda/12.8.0 nvhpc/25.3 hwloc/2.11.1 ucx/1.18.0 pmix/5.0.3 prrte/3.0.6 ucc/1.3.0 openmpi/nvhpc_25.3/5.0.3 nvompi/253.503 szip/2.1.1
Currently Loaded Modulefiles:
1) cmake/3.28.3      3) cuda/12.8.0   5) hwloc/2.11.1   7) pmix/5.0.3    9) ucc/1.3.0                 11) nvompi/253.503  13) hdf5/nvhpc_25.3_openmpi_5.0.3/1.14.3
2) gcccore/.14.2.0   4) nvhpc/25.3    6) ucx/1.18.0     8) prrte/3.0.6  10) openmpi/nvhpc_25.3/5.0.3  12) szip/2.1.1

Key:
auto-loaded
The following problems are currently implemented:
0  AC
1  Advection-Diffusion
2  Crystal_growth_Younsi
3  GPMixt
4  GPMixtNS
5  GPMixtTernary
6  GPMuTernary
7  NS
8  NS_3phases_1comp_phase_change
9  NSAC_Comp
10 NSAC_Comp_3phases
11 NSAC_Comp_3phases3D
12 NSAC_coupling
13 NSAC_Fakhari
14 NSAC_Surfactant
Choose which problems to include by indicating a list of space or comma separated numbers, eg '0 1' or '0,1'.
Write 'all' to include all problems.
Problem numbers:

Enter the problem to be compiled

Write the number of your problem, e.g. 9 for NSAC_Comp problem:

Problem numbers: 9

returns

Problem n°9  added to compile list (NSAC_Comp)
//===================================================
cmake command is : cmake -DKokkos_ENABLE_OPENMP=ON -DKokkos_ENABLE_CUDA=ON -DKokkos_ARCH_HOPPER90=ON -DUSE_HDF5=ON -DCMAKE_SKIP_RPATH=ON -DCMAKE_BUILD_WITH_INSTALL_RPATH=OFF -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DPROBLEM=NSAC_Comp /home/catA/ac165432/CODEV-TULEAP_LBM_Saclay/LBM_Saclay_Rech-Dev

//===================================================
LBM_saclay build configuration:
//===================================================
C++ Compiler : NVHPC 25.3.0
   /product/rocky9-x86_64_cluster/apps/NVHPC/25.3/Linux_x86_64/25.3/compilers/bin/nvc++
MPI not enabled
Kokkos OpenMP enabled : ON
Kokkos CUDA   enabled : ON
Kokkos CUDA   Lambda  : ON
Kokkos CUDA   flags   : -extended-lambda;-Wext-lambda-captures-this;-arch=sm_90
Kokkos HWLOC  enabled : Off
HDF5 found version    : 1.14.3
HDF5 definitions      :
HDF5 parallel         : TRUE
HDF5 includes dirs    : /product/rocky9-x86_64_cluster/apps/HDF5/1.14.3-nvompi-253.503/include;/product/rocky9-x86_64_cluster/apps/Szip/2.1.1/include
HDF5 libraries        : /product/rocky9-x86_64_cluster/apps/HDF5/1.14.3-nvompi-253.503/lib/libhdf5.so;/usr/lib64/libpthread.a;/product/rocky9-x86_64_cluster/apps/Szip/2.1.1/lib/libsz.so;/usr/lib64/libz.so;/usr/lib64/libdl.a;/usr/lib64/libm.so

//===================================================

build configured in:
/home/catA/ac165432/CODEV-TULEAP_LBM_Saclay/LBM_Saclay_Rech-Dev/build_cuda_h100/build_NSAC_Comp
go there and use make to compile

Go there to compile

$ cd build_cuda_h100/build_NSAC_Comp
$ make -j 22

After 15min, the binary LBM_saclay is created in the directory build_cuda_h100/build_NSAC_Comp/src.

3. Details of commands for compiling

Single GPU: compilation and run (deprecated)

Multi-GPU: compilation and run on multi-H100 with MPI (updated July 8th, 2025)

Modules to load

$ module load cmake/3.28.3
$ module load hdf5/nvhpc_25.3_openmpi_5.0.3/1.14.3

The hdf5 module loads the following ones: gcccore/.14.2.0 cuda/12.8.0 nvhpc/25.3 hwloc/2.11.1 ucx/1.18.0 pmix/5.0.3 prrte/3.0.6 ucc/1.3.0 openmpi/nvhpc_25.3/5.0.3 nvompi/253.503 szip/2.1.1.

cmake and compilation

Create a directory

$ mkdir build_cuda_h100_mpi
$ cd build_cuda_h100_mpi

Next, two methods are possible. In the first one, turn off the RPATH:

First method: turn off the RPATH in cmake

For that purpose you must add the following options after -DPROBLEM=NSAC_Comp in your cmake command: -DCMAKE_SKIP_RPATH=ON -DCMAKE_BUILD_WITH_INSTALL_RPATH=OFF -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE

$ cmake -DKokkos_ENABLE_OPENMP=ON -DKokkos_ENABLE_CUDA=ON -DKokkos_ENABLE_CUDA_LAMBDA=ON -DKokkos_ARCH_HOPPER90=ON -DUSE_MPI=ON -DUSE_MPI_CUDA_AWARE_ENFORCED=ON -DKokkos_ENABLE_HWLOC=ON -DUSE_HDF5=ON -DPROBLEM=NSAC_Comp -DCMAKE_SKIP_RPATH=ON -DCMAKE_BUILD_WITH_INSTALL_RPATH=OFF -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE ..

$ make -j 22

In the second one, use command make install

Second method: make install

Add an option for cmake: -DCMAKE_INSTALL_PREFIX=${LBMSACLAY_TOP_DIR}/install/ where LBMSACLAY_TOP_DIR is the LBM directory

$ cmake -DKokkos_ENABLE_OPENMP=ON -DKokkos_ENABLE_CUDA=ON -DKokkos_ENABLE_CUDA_LAMBDA=ON -DKokkos_ARCH_HOPPER90=ON -DUSE_MPI=ON -DUSE_MPI_CUDA_AWARE_ENFORCED=ON -DKokkos_ENABLE_HWLOC=ON -DUSE_HDF5=ON -DPROBLEM=NSAC_Comp -DCMAKE_INSTALL_PREFIX=${LBMSACLAY_TOP_DIR}/install/ ..
$ make -j 22
$ make install

Run your job on 4 GPU with MPI

Write a script to submit your job, e.g. with name GPU_H100_Taylor-Bubble3D.slurm:

#!/bin/bash

# Paramètres du job slurm. Ajuster en particulier le Walltime, le nom du job
# et fichiers de sortie/erreur, et l'adresse mail.
# Plus d'infos à la section 5. du manuel utilisateur d'Orcus.

#SBATCH -N 1                                  # Nombre de nœuds
#SBATCH -n 4                                  # Nombre de taches
#SBATCH --gres=gpu:4                          # nb GPU Orcus (commande à rajouter depuis décembre)
#SBATCH -p gpuq_h100                          # Partitions adaptées au job
#SBATCH -t 12:00:00                           # Walltime
#SBATCH -J myjob                              # Nom du job
#SBATCH -o myjob_%j.o                         # Fichier de sortie
#SBATCH -e myjob_%j.e                         # Fichier d'erreur
#SBATCH --mail-user=user@cea.fr               # Adresse email
#SBATCH --mail-type=begin,end,fail            # Envoi d'email à l'exécution, fin et échec du job

# Module nécessaires au fonctionnement de LBM_saclay.
module purge
module load cmake/3.28.3
module load hdf5/nvhpc_25.3_openmpi_5.0.3/1.14.3

# Lance LBM_saclay. Remplacer les deux chemins comme nécessaire.
mpirun -np 4 /home/catA/loginname/LBM_Saclay_Rech-Dev/build_cuda_h100_mpi/src/LBM_saclay $1 --kokkos-map-device-id-by=mpi_rank
exit 0

The job will be run on 4 MPI processes (#SBATCH -n 4 and mpirun -np 4) on partition (#SBATCH -p gpuq_h100).

Submit your job

Submit your job with the following command

$ sbatch GPU_H100_Taylor-Bubble3D.slurm TestCase19_Taylor-Bubble3D_004-GPU.ini

where TestCase19_Taylor-Bubble3D_004-GPU.ini is the input file for LBM_Saclay with the appropriate domain decomposition. For example in the input file TestCase19_Taylor-Bubble3D_004-GPU.ini the sections [mesh] and [mpi] are set to

[mesh]
nx=128
ny=128
nz=318
xmin=-64.0
xmax=64.0
ymin=-64.0
ymax=64.0
zmin=0.0
zmax=1272.0

[mpi]
mx=1
my=1
mz=4

Section author: Alain Cartalade

ORCUS (CEA-ISAS-DM2S): compilation on single- and multi-GPU - doc updated August 28th, 2025

1. Connexion on Orcus

2. Create the makefile with a shell script

3. Details of commands for compiling

Single GPU: compilation and run (deprecated)

Multi-GPU: compilation and run on multi-H100 with MPI (updated July 8th, 2025)

Table of Contents

Previous topic

Next topic

This Page