Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU

Vanella, Marcos (Fed) via petsc-users Mon, 21 Aug 2023 12:00:30 -0700

Hi Junchao, something I'm noting related to running with cuda enabled linear 
solvers (CG+HYPRE, CG+GAMG) is that for multi cpu-multi gpu calculations, the 
GPU 0 in the node is taking what seems to be all sub-matrices corresponding to 
all the MPI processes in the node. This is the result of the nvidia-smi command 
on a node with 8 MPI processes (each advancing the same number of unknowns in 
the calculation) and 4 GPU V100s:


Mon Aug 21 14:36:07 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 
12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile 
Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  
Compute M. |
|                                         |                      |              
 MIG M. |
|=========================================+======================+======================|
|   0  Tesla V100-SXM2-16GB           On  | 00000004:04:00.0 Off |              
      0 |
| N/A   34C    P0              63W / 300W |   2488MiB / 16384MiB |      0%      
Default |
|                                         |                      |              
    N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2-16GB           On  | 00000004:05:00.0 Off |              
      0 |
| N/A   38C    P0              56W / 300W |    638MiB / 16384MiB |      0%      
Default |
|                                         |                      |              
    N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2-16GB           On  | 00000035:03:00.0 Off |              
      0 |
| N/A   35C    P0              52W / 300W |    638MiB / 16384MiB |      0%      
Default |
|                                         |                      |              
    N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2-16GB           On  | 00000035:04:00.0 Off |              
      0 |
| N/A   38C    P0              53W / 300W |    638MiB / 16384MiB |      0%      
Default |
|                                         |                      |              
    N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                    
        |
|  GPU   GI   CI        PID   Type   Process name                            
GPU Memory |
|        ID   ID                                                             
Usage      |
|=======================================================================================|
|    0   N/A  N/A    214626      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux     
 318MiB |
|    0   N/A  N/A    214627      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux     
 308MiB |
|    0   N/A  N/A    214628      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux     
 308MiB |
|    0   N/A  N/A    214629      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux     
 308MiB |
|    0   N/A  N/A    214630      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux     
 318MiB |
|    0   N/A  N/A    214631      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux     
 308MiB |
|    0   N/A  N/A    214632      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux     
 308MiB |
|    0   N/A  N/A    214633      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux     
 308MiB |
|    1   N/A  N/A    214627      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux     
 318MiB |
|    1   N/A  N/A    214631      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux     
 318MiB |
|    2   N/A  N/A    214628      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux     
 318MiB |
|    2   N/A  N/A    214632      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux     
 318MiB |
|    3   N/A  N/A    214629      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux     
 318MiB |
|    3   N/A  N/A    214633      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux     
 318MiB |
+---------------------------------------------------------------------------------------+


You can see that GPU 0 is connected to all 8 MPI Processes, each taking about 
300MB on it, whereas GPUs 1,2 and 3 are working with 2 MPI Processes. I'm 
wondering if this is expected or there are some changes I need to do on my 
submission script/runtime parameters.
This is the script in this case (2 nodes, 8 MPI processes/node, 4 GPU/node):

#!/bin/bash
# ../../Utilities/Scripts/qfds.sh -p 2  -T db -d test.fds
#SBATCH -J test
#SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err
#SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log
#SBATCH --partition=gpu
#SBATCH --ntasks=16
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=1
#SBATCH --nodes=2
#SBATCH --time=01:00:00
#SBATCH --gres=gpu:4

export OMP_NUM_THREADS=1
# modules
module load cuda/11.7
module load gcc/11.2.1/toolset
module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7

cd /home/mnv/Firemodels_fork/fds/Issues/PETSc

srun -N 2 -n 16 
/home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds 
-pc_type gamg -mat_type aijcusparse -vec_type cuda

Thank you for the advice,
Marcos

Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU

Reply via email to