Hi Junchao, both the slurm scontrol show job_id -dd and looking at
CUDA_VISIBLE_DEVICES does not provide information about which MPI process is
associated to which GPU in the node in our system. I can see this with
nvidia-smi, but if you have any other suggestion using slurm I would like to
hear it.
I've been trying to compile the code+Petsc in summit, but have been having all
sorts of issues related to spectrum-mpi, and the different compilers they
provide (I tried gcc, nvhpc, pgi, xl. Some of them don't handle Fortran 2018,
others give issues of repeated MPI definitions, etc.).
I also wanted to ask you, do you know if it is possible to compile PETSc with
the xl/16.1.1-10 suite?
Thanks!
I configured the library --with-cuda and when compiling I get a compilation
error with CUDAC:
CUDAC arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o
In file included from
/autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:1:
In file included from
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5:
In file included from
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7:
In file included from
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44:
In file included from
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532:
In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24:
In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23:
In file included from
/sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27:
/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6:
warning: Thrust requires at least Clang 7.0. Define
THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.
[-W#pragma-messages]
THRUST_COMPILER_DEPRECATION(Clang 7.0);
^
/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: note:
expanded from macro 'THRUST_COMPILER_DEPRECATION'
THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define
THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.)
^
/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: note:
expanded from macro 'THRUST_COMP_DEPR_IMPL'
# define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning #msg)
^
/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: note:
expanded from macro 'THRUST_COMP_DEPR_IMPL0'
# define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr)
^
<scratch space>:141:6: note: expanded from here
GCC warning "Thrust requires at least Clang 7.0. Define
THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message."
^
In file included from
/autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:2:
In file included from /sw/summit/cuda/11.7.1/include/thrust/transform.h:721:
In file included from
/sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27:
In file included from
/sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104:
In file included from
/sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19:
In file included from /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277:
In file included from
/sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27:
In file included from
/sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42:
In file included from
/sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35:
In file included from
/sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36:
In file included from
/sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19:
In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36:
/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: CUB
requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to
suppress this message. [-W#pragma-messages]
CUB_COMPILER_DEPRECATION(Clang 7.0);
^
/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: expanded
from macro 'CUB_COMPILER_DEPRECATION'
CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define
CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.)
^
/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: expanded
from macro 'CUB_COMP_DEPR_IMPL'
# define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg)
^
/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: expanded
from macro 'CUB_COMP_DEPR_IMPL0'
# define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr)
^
<scratch space>:198:6: note: expanded from here
GCC warning "CUB requires at least Clang 7.0. Define
CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message."
^
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68):
warning #1835-D: attribute "warn_unused_result" does not apply here
In file included from
/autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:1:
In file included from
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5:
In file included from
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7:
In file included from
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44:
In file included from
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532:
In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24:
In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23:
In file included from
/sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27:
/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6:
warning: Thrust requires at least Clang 7.0. Define
THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.
[-W#pragma-messages]
THRUST_COMPILER_DEPRECATION(Clang 7.0);
^
/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: note:
expanded from macro 'THRUST_COMPILER_DEPRECATION'
THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define
THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.)
^
/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: note:
expanded from macro 'THRUST_COMP_DEPR_IMPL'
# define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning #msg)
^
/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: note:
expanded from macro 'THRUST_COMP_DEPR_IMPL0'
# define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr)
^
<scratch space>:149:6: note: expanded from here
GCC warning "Thrust requires at least Clang 7.0. Define
THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message."
^
In file included from
/autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:2:
In file included from /sw/summit/cuda/11.7.1/include/thrust/transform.h:721:
In file included from
/sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27:
In file included from
/sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104:
In file included from
/sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19:
In file included from /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277:
In file included from
/sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27:
In file included from
/sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42:
In file included from
/sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35:
In file included from
/sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36:
In file included from
/sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19:
In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36:
/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: CUB
requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to
suppress this message. [-W#pragma-messages]
CUB_COMPILER_DEPRECATION(Clang 7.0);
^
/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: expanded
from macro 'CUB_COMPILER_DEPRECATION'
CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define
CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.)
^
/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: expanded
from macro 'CUB_COMP_DEPR_IMPL'
# define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg)
^
/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: expanded
from macro 'CUB_COMP_DEPR_IMPL0'
# define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr)
^
<scratch space>:208:6: note: expanded from here
GCC warning "CUB requires at least Clang 7.0. Define
CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message."
^
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68):
warning #1835-D: attribute "warn_unused_result" does not apply here
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:55:3:
error: use of undeclared identifier '__builtin_assume'
; __builtin_assume(a);
^
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:78:3:
error: use of undeclared identifier '__builtin_assume'
; __builtin_assume(a);
^
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:107:3:
error: use of undeclared identifier '__builtin_assume'
; __builtin_assume(len);
^
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:144:3:
error: use of undeclared identifier '__builtin_assume'
; __builtin_assume(t);
^
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:150:3:
error: use of undeclared identifier '__builtin_assume'
; __builtin_assume(s);
^
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:198:3:
error: use of undeclared identifier '__builtin_assume'
; __builtin_assume(flg);
^
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:249:3:
error: use of undeclared identifier '__builtin_assume'
; __builtin_assume(n);
^
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:251:3:
error: use of undeclared identifier '__builtin_assume'
; __builtin_assume(s);
^
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:291:3:
error: use of undeclared identifier '__builtin_assume'
; __builtin_assume(n);
^
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:330:3:
error: use of undeclared identifier '__builtin_assume'
; __builtin_assume(t);
^
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:333:3:
error: use of undeclared identifier '__builtin_assume'
; __builtin_assume(a);
^
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:334:3:
error: use of undeclared identifier '__builtin_assume'
; __builtin_assume(b);
^
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:367:3:
error: use of undeclared identifier '__builtin_assume'
; __builtin_assume(a);
^
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:368:3:
error: use of undeclared identifier '__builtin_assume'
; __builtin_assume(b);
^
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:369:3:
error: use of undeclared identifier '__builtin_assume'
; __builtin_assume(tmp);
^
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:403:3:
error: use of undeclared identifier '__builtin_assume'
; __builtin_assume(haystack);
^
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:404:3:
error: use of undeclared identifier '__builtin_assume'
; __builtin_assume(needle);
^
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:405:3:
error: use of undeclared identifier '__builtin_assume'
; __builtin_assume(tmp);
^
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:437:3:
error: use of undeclared identifier '__builtin_assume'
; __builtin_assume(t);
^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
Error while processing /tmp/tmpxft_0001add6_00000000-6_curand2.cudafe1.cpp.
gmake[3]: *** [gmakefile:209:
arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o] Error 1
gmake[2]: ***
[/autofs/nccs-svm1_home1/vanellam/Software/petsc/lib/petsc/conf/rules.doc:28:
libs] Error 2
**************************ERROR*************************************
Error during compile, check arch-linux-opt-xl/lib/petsc/conf/make.log
Send it and arch-linux-opt-xl/lib/petsc/conf/configure.log to
[email protected]
********************************************************************
________________________________
From: Junchao Zhang <[email protected]>
Sent: Monday, August 21, 2023 4:17 PM
To: Vanella, Marcos (Fed) <[email protected]>
Cc: PETSc users list <[email protected]>; Guan, Collin X. (Fed)
<[email protected]>
Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi
processes and 1 GPU
That is a good question. Looking at
https://slurm.schedmd.com/gres.html#GPU_Management, I was wondering if you can
share the output of your job so we can search CUDA_VISIBLE_DEVICES and see how
GPUs were allocated.
--Junchao Zhang
On Mon, Aug 21, 2023 at 2:38 PM Vanella, Marcos (Fed)
<[email protected]<mailto:[email protected]>> wrote:
Ok thanks Junchao, so is GPU 0 actually allocating memory for the 8 MPI
processes meshes but only working on 2 of them?
It says in the script it has allocated 2.4GB
Best,
Marcos
________________________________
From: Junchao Zhang <[email protected]<mailto:[email protected]>>
Sent: Monday, August 21, 2023 3:29 PM
To: Vanella, Marcos (Fed)
<[email protected]<mailto:[email protected]>>
Cc: PETSc users list <[email protected]<mailto:[email protected]>>;
Guan, Collin X. (Fed) <[email protected]<mailto:[email protected]>>
Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi
processes and 1 GPU
Hi, Macros,
If you look at the PIDs of the nvidia-smi output, you will only find 8 unique
PIDs, which is expected since you allocated 8 MPI ranks per node.
The duplicate PIDs are usually for threads spawned by the MPI runtime (for
example, progress threads in MPI implementation). So your job script and
output are all good.
Thanks.
On Mon, Aug 21, 2023 at 2:00 PM Vanella, Marcos (Fed)
<[email protected]<mailto:[email protected]>> wrote:
Hi Junchao, something I'm noting related to running with cuda enabled linear
solvers (CG+HYPRE, CG+GAMG) is that for multi cpu-multi gpu calculations, the
GPU 0 in the node is taking what seems to be all sub-matrices corresponding to
all the MPI processes in the node. This is the result of the nvidia-smi command
on a node with 8 MPI processes (each advancing the same number of unknowns in
the calculation) and 4 GPU V100s:
Mon Aug 21 14:36:07 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version:
12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile
Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util
Compute M. |
| | |
MIG M. |
|=========================================+======================+======================|
| 0 Tesla V100-SXM2-16GB On | 00000004:04:00.0 Off |
0 |
| N/A 34C P0 63W / 300W | 2488MiB / 16384MiB | 0%
Default |
| | |
N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2-16GB On | 00000004:05:00.0 Off |
0 |
| N/A 38C P0 56W / 300W | 638MiB / 16384MiB | 0%
Default |
| | |
N/A |
+-----------------------------------------+----------------------+----------------------+
| 2 Tesla V100-SXM2-16GB On | 00000035:03:00.0 Off |
0 |
| N/A 35C P0 52W / 300W | 638MiB / 16384MiB | 0%
Default |
| | |
N/A |
+-----------------------------------------+----------------------+----------------------+
| 3 Tesla V100-SXM2-16GB On | 00000035:04:00.0 Off |
0 |
| N/A 38C P0 53W / 300W | 638MiB / 16384MiB | 0%
Default |
| | |
N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes:
|
| GPU GI CI PID Type Process name
GPU Memory |
| ID ID
Usage |
|=======================================================================================|
| 0 N/A N/A 214626 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux
318MiB |
| 0 N/A N/A 214627 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux
308MiB |
| 0 N/A N/A 214628 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux
308MiB |
| 0 N/A N/A 214629 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux
308MiB |
| 0 N/A N/A 214630 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux
318MiB |
| 0 N/A N/A 214631 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux
308MiB |
| 0 N/A N/A 214632 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux
308MiB |
| 0 N/A N/A 214633 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux
308MiB |
| 1 N/A N/A 214627 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux
318MiB |
| 1 N/A N/A 214631 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux
318MiB |
| 2 N/A N/A 214628 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux
318MiB |
| 2 N/A N/A 214632 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux
318MiB |
| 3 N/A N/A 214629 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux
318MiB |
| 3 N/A N/A 214633 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux
318MiB |
+---------------------------------------------------------------------------------------+
You can see that GPU 0 is connected to all 8 MPI Processes, each taking about
300MB on it, whereas GPUs 1,2 and 3 are working with 2 MPI Processes. I'm
wondering if this is expected or there are some changes I need to do on my
submission script/runtime parameters.
This is the script in this case (2 nodes, 8 MPI processes/node, 4 GPU/node):
#!/bin/bash
# ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds
#SBATCH -J test
#SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err
#SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log
#SBATCH --partition=gpu
#SBATCH --ntasks=16
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=1
#SBATCH --nodes=2
#SBATCH --time=01:00:00
#SBATCH --gres=gpu:4
export OMP_NUM_THREADS=1
# modules
module load cuda/11.7
module load gcc/11.2.1/toolset
module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7
cd /home/mnv/Firemodels_fork/fds/Issues/PETSc
srun -N 2 -n 16
/home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds
-pc_type gamg -mat_type aijcusparse -vec_type cuda
Thank you for the advice,
Marcos