Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU

Barry Smith Thu, 24 Aug 2023 12:07:57 -0700


> On Aug 24, 2023, at 2:00 PM, Vanella, Marcos (Fed) <[email protected]> 
> wrote:
> 
> Thank you Barry, I will dial back the MPI_F08 use in our source code and try 
> compiling it. I haven't found much information regarding using MPI and 
> MPI_F08 in different modules other than the following link from several years 
> ago:
> 
> https://users.open-mpi.narkive.com/eCCG36Ni/ompi-fortran-problem-when-mixing-use-mpi-and-use-mpi-f08-with-gfortran-5
> 
> Looks like this has been fixed for openmpi and newer gfortran versions 
> because I don't have issues with this MPI lib/compiler combination. Same with 
> openmpi/ifort.
> What I find quite interesting is: I assumed the PRIVATE statement in a module 
> should provide a backstop on the access propagation of variables not 
> explicitly stated in the PUBLIC statement in a module, including the ones 
> that belong to other modules upstream visible through USE. This does not seem 
> to be the case here.


   I agree, you had seemingly inconsistent results with your different tests; 
it could be bugs in the handling of modules by the Fortran system.


> 
> Best,
> Marcos
> 
>  
> From: Barry Smith <[email protected] <mailto:[email protected]>>
> Sent: Thursday, August 24, 2023 12:40 PM
> To: Vanella, Marcos (Fed) <[email protected] 
> <mailto:[email protected]>>
> Cc: PETSc users list <[email protected] 
> <mailto:[email protected]>>; Guan, Collin X. (Fed) 
> <[email protected] <mailto:[email protected]>>
> Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi 
> processes and 1 GPU
>  
> 
>    PETSc uses the non-MPI_F08 Fortran modules so I am guessing when you also 
> use the MPI_F08 modules the compiler sees two sets of interfaces for the same 
> functions hence the error.  I am not sure if it portable to use PETSc with 
> the F08 Fortran modules  in the same program or routine.
> 
> 
> 
> 
> 
>> On Aug 24, 2023, at 12:22 PM, Vanella, Marcos (Fed) via petsc-users 
>> <[email protected] <mailto:[email protected]>> wrote:
>> 
>> Thank you Matt and Junchao. I've been testing further with nvhpc on summit. 
>> You might have an idea on what is going on here. 
>> These are my modules:
>> 
>> Currently Loaded Modules:
>>   1) lsf-tools/2.0   3) darshan-runtime/3.4.0-lite   5) DefApps       7) 
>> spectrum-mpi/10.4.0.3-20210112   9) nsight-systems/2021.3.1.54
>>   2) hsi/5.0.2.p5    4) xalt/1.2.1                   6) nvhpc/22.11   8) 
>> nsight-compute/2021.2.1         10) cuda/11.7.1
>> 
>> I configured and compiled petsc with these options:
>> 
>> ./configure COPTFLAGS="-O2" CXXOPTFLAGS="-O2" FOPTFLAGS="-O2" 
>> FCOPTFLAGS="-O2" CUDAOPTFLAGS="-O2" --with-debugging=0 
>> --download-suitesparse --download-hypre --download-fblaslapack --with-cuda
>> 
>> without issues. The MPI checks did not go through as this was done in the 
>> login node.
>> 
>> Then, I started getting (similarly to what I saw with pgi and gcc in summit) 
>> ambiguous interface errors related to mpi routines. I was able to make a 
>> simple piece of code that reproduces this. It has to do with having a USE 
>> PETSC statement in a module (TEST_MOD) and a USE MPI_F08 on the main program 
>> (MAIN) using that module, even though the PRIVATE statement has been used in 
>> said (TEST_MOD) module.
>> 
>> MODULE TEST_MOD
>> ! In this module we use PETSC.
>> USE PETSC
>> !USE MPI
>> IMPLICIT NONE
>> PRIVATE
>> PUBLIC :: TEST1
>> 
>> CONTAINS
>> SUBROUTINE TEST1(A)
>> IMPLICIT NONE
>> REAL, INTENT(INOUT) :: A
>> INTEGER :: IERR
>> A=0.
>> ENDSUBROUTINE TEST1
>> 
>> ENDMODULE TEST_MOD
>> 
>> 
>> PROGRAM MAIN
>> 
>> ! Assume in main we use some MPI_F08 features.
>> USE MPI_F08
>> USE TEST_MOD, ONLY : TEST1
>> IMPLICIT NONE
>> INTEGER :: MY_RANK,IERR=0
>> INTEGER :: PNAMELEN=0
>> INTEGER :: PROVIDED
>> INTEGER, PARAMETER :: REQUIRED=MPI_THREAD_FUNNELED
>> REAL :: A=0.
>> CALL MPI_INIT_THREAD(REQUIRED,PROVIDED,IERR)
>> CALL MPI_COMM_RANK(MPI_COMM_WORLD, MY_RANK, IERR)
>> CALL TEST1(A)
>> CALL MPI_FINALIZE(IERR)
>> 
>> ENDPROGRAM MAIN
>> 
>> Leaving the USE PETSC statement in TEST_MOD this is what I get when trying 
>> to compile this code:
>> 
>> vanellam@login5 test_spectrum_issue $ mpifort -c 
>> -I"/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/" 
>> -I"/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-c-opt-nvhpc/include"
>>   mpitest.f90
>> NVFORTRAN-S-0155-Ambiguous interfaces for generic procedure mpi_init_thread 
>> (mpitest.f90: 34)
>> NVFORTRAN-S-0155-Ambiguous interfaces for generic procedure mpi_finalize 
>> (mpitest.f90: 37)
>>   0 inform,   0 warnings,   2 severes, 0 fatal for main
>> 
>> Now, if I change USE PETSC by USE MPI in the module TEST_MOD compilation 
>> proceeds correctly. If I leave the USE PETSC statement in the module and 
>> change to USE MPI the statement in main compilation also goes through. So it 
>> seems to be something related to using the PETSC and MPI_F08 modules. My 
>> take is that it is related to spectrum-mpi, as I haven't had issues 
>> compiling the FDS+PETSc with openmpi in other systems.
>> 
>> Well please let me know if you have any ideas on what might be going on. 
>> I'll move to polaris and try with mpich too.
>> 
>> Thanks!
>> Marcos
>> 
>> 
>> From: Junchao Zhang <[email protected] 
>> <mailto:[email protected]>>
>> Sent: Tuesday, August 22, 2023 5:25 PM
>> To: Matthew Knepley <[email protected] <mailto:[email protected]>>
>> Cc: Vanella, Marcos (Fed) <[email protected] 
>> <mailto:[email protected]>>; PETSc users list <[email protected] 
>> <mailto:[email protected]>>; Guan, Collin X. (Fed) 
>> <[email protected] <mailto:[email protected]>>
>> Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi 
>> processes and 1 GPU
>>  
>> Macros,
>>   yes, refer to the example script Matt mentioned for Summit.  Feel free to 
>> turn on/off options in the file.  In my experience, gcc is easier to use.
>>   Also, I found 
>> https://docs.alcf.anl.gov/polaris/running-jobs/#binding-mpi-ranks-to-gpus, 
>> which might be similar to your machine (4 GPUs per node).  The key point is: 
>> The Cray MPI on Polaris does not currently support binding MPI ranks to 
>> GPUs. For applications that need this support, this instead can be handled 
>> by use of a small helper script that will appropriately set 
>> CUDA_VISIBLE_DEVICES for each MPI rank.
>>   So you can try the helper script set_affinity_gpu_polaris.sh to manually 
>> set  CUDA_VISIBLE_DEVICES.  In other words, make the script on your PATH and 
>> then run your job with
>>       srun -N 2 -n 16 set_affinity_gpu_polaris.sh 
>> /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux 
>> test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda
>> 
>>   Then, check again with nvidia-smi to see if GPU memory is evenly allocated.
>> --Junchao Zhang
>> 
>> 
>> On Tue, Aug 22, 2023 at 3:03 PM Matthew Knepley <[email protected] 
>> <mailto:[email protected]>> wrote:
>> On Tue, Aug 22, 2023 at 2:54 PM Vanella, Marcos (Fed) via petsc-users 
>> <[email protected] <mailto:[email protected]>> wrote:
>> Hi Junchao, both the slurm scontrol show job_id -dd and looking at 
>> CUDA_VISIBLE_DEVICES does not provide information about which MPI process is 
>> associated to which GPU in the node in our system. I can see this with 
>> nvidia-smi, but if you have any other suggestion using slurm I would like to 
>> hear it.
>> 
>> I've been trying to compile the code+Petsc in summit, but have been having 
>> all sorts of issues related to spectrum-mpi, and the different compilers 
>> they provide (I tried gcc, nvhpc, pgi, xl. Some of them don't handle Fortran 
>> 2018, others give issues of repeated MPI definitions, etc.). 
>> 
>> The PETSc configure examples are in the repository:
>> 
>>    
>> https://gitlab.com/petsc/petsc/-/blob/main/config/examples/arch-olcf-summit-opt.py?ref_type=heads
>> 
>>     Thanks,
>> 
>>       Matt
>>  
>> I also wanted to ask you, do you know if it is possible to compile PETSc 
>> with the xl/16.1.1-10 suite? 
>> 
>> Thanks!
>> 
>> I configured the library --with-cuda and when compiling I get a compilation 
>> error with CUDAC:
>> 
>> CUDAC arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o
>> In file included from 
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:1
>>  <http://curand2.cu:1/>:
>> In file included from 
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5:
>> In file included from 
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7:
>> In file included from 
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44:
>> In file included from 
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532:
>> In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24:
>> In file included from 
>> /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23:
>> In file included from 
>> /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27:
>> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6: 
>> warning: Thrust requires at least Clang 7.0. Define 
>> THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. 
>> [-W#pragma-messages]
>>      THRUST_COMPILER_DEPRECATION(Clang 7.0);
>>      ^
>> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: 
>> note: expanded from macro 'THRUST_COMPILER_DEPRECATION'
>>   THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define 
>> THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.)
>>   ^
>> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: 
>> note: expanded from macro 'THRUST_COMP_DEPR_IMPL'
>> #  define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning #msg)
>>                                      ^
>> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: 
>> note: expanded from macro 'THRUST_COMP_DEPR_IMPL0'
>> #  define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr)
>>                                        ^
>> <scratch space>:141:6: note: expanded from here
>>  GCC warning "Thrust requires at least Clang 7.0. Define 
>> THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message."
>>      ^
>> In file included from 
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:2
>>  <http://curand2.cu:2/>:
>> In file included from /sw/summit/cuda/11.7.1/include/thrust/transform.h:721:
>> In file included from 
>> /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27:
>> In file included from 
>> /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104:
>> In file included from 
>> /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19:
>> In file included from /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277:
>> In file included from 
>> /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27:
>> In file included from 
>> /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42:
>> In file included from 
>> /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35:
>> In file included from 
>> /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36:
>> In file included from 
>> /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19:
>> In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36:
>> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: CUB 
>> requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to 
>> suppress this message. [-W#pragma-messages]
>>      CUB_COMPILER_DEPRECATION(Clang 7.0);
>>      ^
>> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: 
>> expanded from macro 'CUB_COMPILER_DEPRECATION'
>>   CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define 
>> CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.)
>>   ^
>> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: 
>> expanded from macro 'CUB_COMP_DEPR_IMPL'
>> #  define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg)
>>                                   ^
>> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: 
>> expanded from macro 'CUB_COMP_DEPR_IMPL0'
>> #  define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr)
>>                                     ^
>> <scratch space>:198:6: note: expanded from here
>>  GCC warning "CUB requires at least Clang 7.0. Define 
>> CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message."
>>      ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68): 
>> warning #1835-D: attribute "warn_unused_result" does not apply here
>> 
>> In file included from 
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:1
>>  <http://curand2.cu:1/>:
>> In file included from 
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5:
>> In file included from 
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7:
>> In file included from 
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44:
>> In file included from 
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532:
>> In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24:
>> In file included from 
>> /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23:
>> In file included from 
>> /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27:
>> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6: 
>> warning: Thrust requires at least Clang 7.0. Define 
>> THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. 
>> [-W#pragma-messages]
>>      THRUST_COMPILER_DEPRECATION(Clang 7.0);
>>      ^
>> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: 
>> note: expanded from macro 'THRUST_COMPILER_DEPRECATION'
>>   THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define 
>> THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.)
>>   ^
>> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: 
>> note: expanded from macro 'THRUST_COMP_DEPR_IMPL'
>> #  define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning #msg)
>>                                      ^
>> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: 
>> note: expanded from macro 'THRUST_COMP_DEPR_IMPL0'
>> #  define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr)
>>                                        ^
>> <scratch space>:149:6: note: expanded from here
>>  GCC warning "Thrust requires at least Clang 7.0. Define 
>> THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message."
>>      ^
>> In file included from 
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:2
>>  <http://curand2.cu:2/>:
>> In file included from /sw/summit/cuda/11.7.1/include/thrust/transform.h:721:
>> In file included from 
>> /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27:
>> In file included from 
>> /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104:
>> In file included from 
>> /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19:
>> In file included from /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277:
>> In file included from 
>> /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27:
>> In file included from 
>> /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42:
>> In file included from 
>> /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35:
>> In file included from 
>> /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36:
>> In file included from 
>> /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19:
>> In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36:
>> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: CUB 
>> requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to 
>> suppress this message. [-W#pragma-messages]
>>      CUB_COMPILER_DEPRECATION(Clang 7.0);
>>      ^
>> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: 
>> expanded from macro 'CUB_COMPILER_DEPRECATION'
>>   CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define 
>> CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.)
>>   ^
>> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: 
>> expanded from macro 'CUB_COMP_DEPR_IMPL'
>> #  define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg)
>>                                   ^
>> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: 
>> expanded from macro 'CUB_COMP_DEPR_IMPL0'
>> #  define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr)
>>                                     ^
>> <scratch space>:208:6: note: expanded from here
>>  GCC warning "CUB requires at least Clang 7.0. Define 
>> CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message."
>>      ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68): 
>> warning #1835-D: attribute "warn_unused_result" does not apply here
>> 
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:55:3: 
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(a); 
>>   ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:78:3: 
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(a); 
>>   ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:107:3: 
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(len); 
>>   ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:144:3: 
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(t); 
>>   ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:150:3: 
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(s); 
>>   ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:198:3: 
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(flg); 
>>   ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:249:3: 
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(n); 
>>   ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:251:3: 
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(s); 
>>   ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:291:3: 
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(n); 
>>   ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:330:3: 
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(t); 
>>   ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:333:3: 
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(a); 
>>   ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:334:3: 
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(b); 
>>   ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:367:3: 
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(a); 
>>   ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:368:3: 
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(b); 
>>   ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:369:3: 
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(tmp); 
>>   ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:403:3: 
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(haystack);
>>   ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:404:3: 
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(needle);
>>   ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:405:3: 
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(tmp); 
>>   ^
>> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:437:3: 
>> error: use of undeclared identifier '__builtin_assume'
>> ; __builtin_assume(t); 
>>   ^
>> fatal error: too many errors emitted, stopping now [-ferror-limit=]
>> 20 errors generated.
>> Error while processing /tmp/tmpxft_0001add6_00000000-6_curand2.cudafe1.cpp.
>> gmake[3]: *** [gmakefile:209: 
>> arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o] Error 1
>> gmake[2]: *** 
>> [/autofs/nccs-svm1_home1/vanellam/Software/petsc/lib/petsc/conf/rules.doc:28:
>>  libs] Error 2
>> **************************ERROR*************************************
>>   Error during compile, check arch-linux-opt-xl/lib/petsc/conf/make.log
>>   Send it and arch-linux-opt-xl/lib/petsc/conf/configure.log to 
>> [email protected] <mailto:[email protected]>
>> ********************************************************************
>> 
>> 
>>  
>> From: Junchao Zhang <[email protected] 
>> <mailto:[email protected]>>
>> Sent: Monday, August 21, 2023 4:17 PM
>> To: Vanella, Marcos (Fed) <[email protected] 
>> <mailto:[email protected]>>
>> Cc: PETSc users list <[email protected] 
>> <mailto:[email protected]>>; Guan, Collin X. (Fed) 
>> <[email protected] <mailto:[email protected]>>
>> Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi 
>> processes and 1 GPU
>>  
>> That is a good question.  Looking at 
>> https://slurm.schedmd.com/gres.html#GPU_Management,  I was wondering if you 
>> can share the output of your job so we can search CUDA_VISIBLE_DEVICES and 
>> see how GPUs were allocated.
>> 
>> --Junchao Zhang
>> 
>> 
>> On Mon, Aug 21, 2023 at 2:38 PM Vanella, Marcos (Fed) 
>> <[email protected] <mailto:[email protected]>> wrote:
>> Ok thanks Junchao, so is GPU 0 actually allocating memory for the 8 MPI 
>> processes meshes but only working on 2 of them? 
>> It says in the script it has allocated 2.4GB
>> Best,
>> Marcos
>> From: Junchao Zhang <[email protected] 
>> <mailto:[email protected]>>
>> Sent: Monday, August 21, 2023 3:29 PM
>> To: Vanella, Marcos (Fed) <[email protected] 
>> <mailto:[email protected]>>
>> Cc: PETSc users list <[email protected] 
>> <mailto:[email protected]>>; Guan, Collin X. (Fed) 
>> <[email protected] <mailto:[email protected]>>
>> Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi 
>> processes and 1 GPU
>>  
>> Hi, Macros,
>>   If you look at the PIDs of the nvidia-smi output, you will only find 8 
>> unique PIDs, which is expected since you allocated 8 MPI ranks per node.
>>   The duplicate PIDs are usually for threads spawned by the MPI runtime (for 
>> example, progress threads in MPI implementation).   So your job script and 
>> output are all good.
>> 
>>   Thanks.
>> 
>> On Mon, Aug 21, 2023 at 2:00 PM Vanella, Marcos (Fed) 
>> <[email protected] <mailto:[email protected]>> wrote:
>> Hi Junchao, something I'm noting related to running with cuda enabled linear 
>> solvers (CG+HYPRE, CG+GAMG) is that for multi cpu-multi gpu calculations, 
>> the GPU 0 in the node is taking what seems to be all sub-matrices 
>> corresponding to all the MPI processes in the node. This is the result of 
>> the nvidia-smi command on a node with 8 MPI processes (each advancing the 
>> same number of unknowns in the calculation) and 4 GPU V100s:
>> 
>> Mon Aug 21 14:36:07 2023       
>> +---------------------------------------------------------------------------------------+
>> | NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA 
>> Version: 12.2     |
>> |-----------------------------------------+----------------------+----------------------+
>> | GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile 
>> Uncorr. ECC |
>> | Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  
>> Compute M. |
>> |                                         |                      |           
>>     MIG M. |
>> |=========================================+======================+======================|
>> |   0  Tesla V100-SXM2-16GB           On  | 00000004:04:00.0 Off |           
>>          0 |
>> | N/A   34C    P0              63W / 300W |   2488MiB / 16384MiB |      0%   
>>    Default |
>> |                                         |                      |           
>>        N/A |
>> +-----------------------------------------+----------------------+----------------------+
>> |   1  Tesla V100-SXM2-16GB           On  | 00000004:05:00.0 Off |           
>>          0 |
>> | N/A   38C    P0              56W / 300W |    638MiB / 16384MiB |      0%   
>>    Default |
>> |                                         |                      |           
>>        N/A |
>> +-----------------------------------------+----------------------+----------------------+
>> |   2  Tesla V100-SXM2-16GB           On  | 00000035:03:00.0 Off |           
>>          0 |
>> | N/A   35C    P0              52W / 300W |    638MiB / 16384MiB |      0%   
>>    Default |
>> |                                         |                      |           
>>        N/A |
>> +-----------------------------------------+----------------------+----------------------+
>> |   3  Tesla V100-SXM2-16GB           On  | 00000035:04:00.0 Off |           
>>          0 |
>> | N/A   38C    P0              53W / 300W |    638MiB / 16384MiB |      0%   
>>    Default |
>> |                                         |                      |           
>>        N/A |
>> +-----------------------------------------+----------------------+----------------------+
>>                                                                              
>>             
>> +---------------------------------------------------------------------------------------+
>> | Processes:                                                                 
>>            |
>> |  GPU   GI   CI        PID   Type   Process name                            
>> GPU Memory |
>> |        ID   ID                                                             
>> Usage      |
>> |=======================================================================================|
>> |    0   N/A  N/A    214626      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux  
>>     318MiB |
>> |    0   N/A  N/A    214627      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux  
>>     308MiB |
>> |    0   N/A  N/A    214628      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux  
>>     308MiB |
>> |    0   N/A  N/A    214629      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux  
>>     308MiB |
>> |    0   N/A  N/A    214630      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux  
>>     318MiB |
>> |    0   N/A  N/A    214631      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux  
>>     308MiB |
>> |    0   N/A  N/A    214632      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux  
>>     308MiB |
>> |    0   N/A  N/A    214633      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux  
>>     308MiB |
>> |    1   N/A  N/A    214627      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux  
>>     318MiB |
>> |    1   N/A  N/A    214631      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux  
>>     318MiB |
>> |    2   N/A  N/A    214628      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux  
>>     318MiB |
>> |    2   N/A  N/A    214632      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux  
>>     318MiB |
>> |    3   N/A  N/A    214629      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux  
>>     318MiB |
>> |    3   N/A  N/A    214633      C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux  
>>     318MiB |
>> +---------------------------------------------------------------------------------------+
>> 
>> 
>> You can see that GPU 0 is connected to all 8 MPI Processes, each taking 
>> about 300MB on it, whereas GPUs 1,2 and 3 are working with 2 MPI Processes. 
>> I'm wondering if this is expected or there are some changes I need to do on 
>> my submission script/runtime parameters.
>> This is the script in this case (2 nodes, 8 MPI processes/node, 4 GPU/node):
>> 
>> #!/bin/bash
>> # ../../Utilities/Scripts/qfds.sh -p 2  -T db -d test.fds
>> #SBATCH -J test 
>> #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err
>> #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log
>> #SBATCH --partition=gpu
>> #SBATCH --ntasks=16
>> #SBATCH --ntasks-per-node=8
>> #SBATCH --cpus-per-task=1
>> #SBATCH --nodes=2
>> #SBATCH --time=01:00:00
>> #SBATCH --gres=gpu:4
>> 
>> export OMP_NUM_THREADS=1
>> # modules
>> module load cuda/11.7
>> module load gcc/11.2.1/toolset
>> module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7
>> 
>> cd /home/mnv/Firemodels_fork/fds/Issues/PETSc
>> 
>> srun -N 2 -n 16 
>> /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux 
>> test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda
>>                                    
>> Thank you for the advice,
>> Marcos
>> 
>>  
>> 
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments 
>> is infinitely more interesting than any results to which their experiments 
>> lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU

Reply via email to