> On Aug 24, 2023, at 2:00 PM, Vanella, Marcos (Fed) <[email protected]> > wrote: > > Thank you Barry, I will dial back the MPI_F08 use in our source code and try > compiling it. I haven't found much information regarding using MPI and > MPI_F08 in different modules other than the following link from several years > ago: > > https://users.open-mpi.narkive.com/eCCG36Ni/ompi-fortran-problem-when-mixing-use-mpi-and-use-mpi-f08-with-gfortran-5 > > Looks like this has been fixed for openmpi and newer gfortran versions > because I don't have issues with this MPI lib/compiler combination. Same with > openmpi/ifort. > What I find quite interesting is: I assumed the PRIVATE statement in a module > should provide a backstop on the access propagation of variables not > explicitly stated in the PUBLIC statement in a module, including the ones > that belong to other modules upstream visible through USE. This does not seem > to be the case here.
I agree, you had seemingly inconsistent results with your different tests; it could be bugs in the handling of modules by the Fortran system. > > Best, > Marcos > > > From: Barry Smith <[email protected] <mailto:[email protected]>> > Sent: Thursday, August 24, 2023 12:40 PM > To: Vanella, Marcos (Fed) <[email protected] > <mailto:[email protected]>> > Cc: PETSc users list <[email protected] > <mailto:[email protected]>>; Guan, Collin X. (Fed) > <[email protected] <mailto:[email protected]>> > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > > PETSc uses the non-MPI_F08 Fortran modules so I am guessing when you also > use the MPI_F08 modules the compiler sees two sets of interfaces for the same > functions hence the error. I am not sure if it portable to use PETSc with > the F08 Fortran modules in the same program or routine. > > > > > >> On Aug 24, 2023, at 12:22 PM, Vanella, Marcos (Fed) via petsc-users >> <[email protected] <mailto:[email protected]>> wrote: >> >> Thank you Matt and Junchao. I've been testing further with nvhpc on summit. >> You might have an idea on what is going on here. >> These are my modules: >> >> Currently Loaded Modules: >> 1) lsf-tools/2.0 3) darshan-runtime/3.4.0-lite 5) DefApps 7) >> spectrum-mpi/10.4.0.3-20210112 9) nsight-systems/2021.3.1.54 >> 2) hsi/5.0.2.p5 4) xalt/1.2.1 6) nvhpc/22.11 8) >> nsight-compute/2021.2.1 10) cuda/11.7.1 >> >> I configured and compiled petsc with these options: >> >> ./configure COPTFLAGS="-O2" CXXOPTFLAGS="-O2" FOPTFLAGS="-O2" >> FCOPTFLAGS="-O2" CUDAOPTFLAGS="-O2" --with-debugging=0 >> --download-suitesparse --download-hypre --download-fblaslapack --with-cuda >> >> without issues. The MPI checks did not go through as this was done in the >> login node. >> >> Then, I started getting (similarly to what I saw with pgi and gcc in summit) >> ambiguous interface errors related to mpi routines. I was able to make a >> simple piece of code that reproduces this. It has to do with having a USE >> PETSC statement in a module (TEST_MOD) and a USE MPI_F08 on the main program >> (MAIN) using that module, even though the PRIVATE statement has been used in >> said (TEST_MOD) module. >> >> MODULE TEST_MOD >> ! In this module we use PETSC. >> USE PETSC >> !USE MPI >> IMPLICIT NONE >> PRIVATE >> PUBLIC :: TEST1 >> >> CONTAINS >> SUBROUTINE TEST1(A) >> IMPLICIT NONE >> REAL, INTENT(INOUT) :: A >> INTEGER :: IERR >> A=0. >> ENDSUBROUTINE TEST1 >> >> ENDMODULE TEST_MOD >> >> >> PROGRAM MAIN >> >> ! Assume in main we use some MPI_F08 features. >> USE MPI_F08 >> USE TEST_MOD, ONLY : TEST1 >> IMPLICIT NONE >> INTEGER :: MY_RANK,IERR=0 >> INTEGER :: PNAMELEN=0 >> INTEGER :: PROVIDED >> INTEGER, PARAMETER :: REQUIRED=MPI_THREAD_FUNNELED >> REAL :: A=0. >> CALL MPI_INIT_THREAD(REQUIRED,PROVIDED,IERR) >> CALL MPI_COMM_RANK(MPI_COMM_WORLD, MY_RANK, IERR) >> CALL TEST1(A) >> CALL MPI_FINALIZE(IERR) >> >> ENDPROGRAM MAIN >> >> Leaving the USE PETSC statement in TEST_MOD this is what I get when trying >> to compile this code: >> >> vanellam@login5 test_spectrum_issue $ mpifort -c >> -I"/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/" >> -I"/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-c-opt-nvhpc/include" >> mpitest.f90 >> NVFORTRAN-S-0155-Ambiguous interfaces for generic procedure mpi_init_thread >> (mpitest.f90: 34) >> NVFORTRAN-S-0155-Ambiguous interfaces for generic procedure mpi_finalize >> (mpitest.f90: 37) >> 0 inform, 0 warnings, 2 severes, 0 fatal for main >> >> Now, if I change USE PETSC by USE MPI in the module TEST_MOD compilation >> proceeds correctly. If I leave the USE PETSC statement in the module and >> change to USE MPI the statement in main compilation also goes through. So it >> seems to be something related to using the PETSC and MPI_F08 modules. My >> take is that it is related to spectrum-mpi, as I haven't had issues >> compiling the FDS+PETSc with openmpi in other systems. >> >> Well please let me know if you have any ideas on what might be going on. >> I'll move to polaris and try with mpich too. >> >> Thanks! >> Marcos >> >> >> From: Junchao Zhang <[email protected] >> <mailto:[email protected]>> >> Sent: Tuesday, August 22, 2023 5:25 PM >> To: Matthew Knepley <[email protected] <mailto:[email protected]>> >> Cc: Vanella, Marcos (Fed) <[email protected] >> <mailto:[email protected]>>; PETSc users list <[email protected] >> <mailto:[email protected]>>; Guan, Collin X. (Fed) >> <[email protected] <mailto:[email protected]>> >> Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi >> processes and 1 GPU >> >> Macros, >> yes, refer to the example script Matt mentioned for Summit. Feel free to >> turn on/off options in the file. In my experience, gcc is easier to use. >> Also, I found >> https://docs.alcf.anl.gov/polaris/running-jobs/#binding-mpi-ranks-to-gpus, >> which might be similar to your machine (4 GPUs per node). The key point is: >> The Cray MPI on Polaris does not currently support binding MPI ranks to >> GPUs. For applications that need this support, this instead can be handled >> by use of a small helper script that will appropriately set >> CUDA_VISIBLE_DEVICES for each MPI rank. >> So you can try the helper script set_affinity_gpu_polaris.sh to manually >> set CUDA_VISIBLE_DEVICES. In other words, make the script on your PATH and >> then run your job with >> srun -N 2 -n 16 set_affinity_gpu_polaris.sh >> /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux >> test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda >> >> Then, check again with nvidia-smi to see if GPU memory is evenly allocated. >> --Junchao Zhang >> >> >> On Tue, Aug 22, 2023 at 3:03 PM Matthew Knepley <[email protected] >> <mailto:[email protected]>> wrote: >> On Tue, Aug 22, 2023 at 2:54 PM Vanella, Marcos (Fed) via petsc-users >> <[email protected] <mailto:[email protected]>> wrote: >> Hi Junchao, both the slurm scontrol show job_id -dd and looking at >> CUDA_VISIBLE_DEVICES does not provide information about which MPI process is >> associated to which GPU in the node in our system. I can see this with >> nvidia-smi, but if you have any other suggestion using slurm I would like to >> hear it. >> >> I've been trying to compile the code+Petsc in summit, but have been having >> all sorts of issues related to spectrum-mpi, and the different compilers >> they provide (I tried gcc, nvhpc, pgi, xl. Some of them don't handle Fortran >> 2018, others give issues of repeated MPI definitions, etc.). >> >> The PETSc configure examples are in the repository: >> >> >> https://gitlab.com/petsc/petsc/-/blob/main/config/examples/arch-olcf-summit-opt.py?ref_type=heads >> >> Thanks, >> >> Matt >> >> I also wanted to ask you, do you know if it is possible to compile PETSc >> with the xl/16.1.1-10 suite? >> >> Thanks! >> >> I configured the library --with-cuda and when compiling I get a compilation >> error with CUDAC: >> >> CUDAC arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o >> In file included from >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:1 >> <http://curand2.cu:1/>: >> In file included from >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5: >> In file included from >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7: >> In file included from >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44: >> In file included from >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532: >> In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27: >> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6: >> warning: Thrust requires at least Clang 7.0. Define >> THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. >> [-W#pragma-messages] >> THRUST_COMPILER_DEPRECATION(Clang 7.0); >> ^ >> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: >> note: expanded from macro 'THRUST_COMPILER_DEPRECATION' >> THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define >> THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) >> ^ >> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: >> note: expanded from macro 'THRUST_COMP_DEPR_IMPL' >> # define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning #msg) >> ^ >> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: >> note: expanded from macro 'THRUST_COMP_DEPR_IMPL0' >> # define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr) >> ^ >> <scratch space>:141:6: note: expanded from here >> GCC warning "Thrust requires at least Clang 7.0. Define >> THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." >> ^ >> In file included from >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:2 >> <http://curand2.cu:2/>: >> In file included from /sw/summit/cuda/11.7.1/include/thrust/transform.h:721: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19: >> In file included from /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36: >> In file included from >> /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19: >> In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36: >> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: CUB >> requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to >> suppress this message. [-W#pragma-messages] >> CUB_COMPILER_DEPRECATION(Clang 7.0); >> ^ >> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: >> expanded from macro 'CUB_COMPILER_DEPRECATION' >> CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define >> CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) >> ^ >> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: >> expanded from macro 'CUB_COMP_DEPR_IMPL' >> # define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg) >> ^ >> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: >> expanded from macro 'CUB_COMP_DEPR_IMPL0' >> # define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr) >> ^ >> <scratch space>:198:6: note: expanded from here >> GCC warning "CUB requires at least Clang 7.0. Define >> CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68): >> warning #1835-D: attribute "warn_unused_result" does not apply here >> >> In file included from >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:1 >> <http://curand2.cu:1/>: >> In file included from >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5: >> In file included from >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7: >> In file included from >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44: >> In file included from >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532: >> In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27: >> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6: >> warning: Thrust requires at least Clang 7.0. Define >> THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. >> [-W#pragma-messages] >> THRUST_COMPILER_DEPRECATION(Clang 7.0); >> ^ >> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: >> note: expanded from macro 'THRUST_COMPILER_DEPRECATION' >> THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define >> THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) >> ^ >> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: >> note: expanded from macro 'THRUST_COMP_DEPR_IMPL' >> # define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning #msg) >> ^ >> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: >> note: expanded from macro 'THRUST_COMP_DEPR_IMPL0' >> # define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr) >> ^ >> <scratch space>:149:6: note: expanded from here >> GCC warning "Thrust requires at least Clang 7.0. Define >> THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." >> ^ >> In file included from >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:2 >> <http://curand2.cu:2/>: >> In file included from /sw/summit/cuda/11.7.1/include/thrust/transform.h:721: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19: >> In file included from /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36: >> In file included from >> /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19: >> In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36: >> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: CUB >> requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to >> suppress this message. [-W#pragma-messages] >> CUB_COMPILER_DEPRECATION(Clang 7.0); >> ^ >> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: >> expanded from macro 'CUB_COMPILER_DEPRECATION' >> CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define >> CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) >> ^ >> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: >> expanded from macro 'CUB_COMP_DEPR_IMPL' >> # define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg) >> ^ >> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: >> expanded from macro 'CUB_COMP_DEPR_IMPL0' >> # define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr) >> ^ >> <scratch space>:208:6: note: expanded from here >> GCC warning "CUB requires at least Clang 7.0. Define >> CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68): >> warning #1835-D: attribute "warn_unused_result" does not apply here >> >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:55:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(a); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:78:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(a); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:107:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(len); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:144:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(t); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:150:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(s); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:198:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(flg); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:249:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(n); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:251:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(s); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:291:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(n); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:330:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(t); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:333:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(a); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:334:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(b); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:367:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(a); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:368:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(b); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:369:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(tmp); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:403:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(haystack); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:404:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(needle); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:405:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(tmp); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:437:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(t); >> ^ >> fatal error: too many errors emitted, stopping now [-ferror-limit=] >> 20 errors generated. >> Error while processing /tmp/tmpxft_0001add6_00000000-6_curand2.cudafe1.cpp. >> gmake[3]: *** [gmakefile:209: >> arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o] Error 1 >> gmake[2]: *** >> [/autofs/nccs-svm1_home1/vanellam/Software/petsc/lib/petsc/conf/rules.doc:28: >> libs] Error 2 >> **************************ERROR************************************* >> Error during compile, check arch-linux-opt-xl/lib/petsc/conf/make.log >> Send it and arch-linux-opt-xl/lib/petsc/conf/configure.log to >> [email protected] <mailto:[email protected]> >> ******************************************************************** >> >> >> >> From: Junchao Zhang <[email protected] >> <mailto:[email protected]>> >> Sent: Monday, August 21, 2023 4:17 PM >> To: Vanella, Marcos (Fed) <[email protected] >> <mailto:[email protected]>> >> Cc: PETSc users list <[email protected] >> <mailto:[email protected]>>; Guan, Collin X. (Fed) >> <[email protected] <mailto:[email protected]>> >> Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi >> processes and 1 GPU >> >> That is a good question. Looking at >> https://slurm.schedmd.com/gres.html#GPU_Management, I was wondering if you >> can share the output of your job so we can search CUDA_VISIBLE_DEVICES and >> see how GPUs were allocated. >> >> --Junchao Zhang >> >> >> On Mon, Aug 21, 2023 at 2:38 PM Vanella, Marcos (Fed) >> <[email protected] <mailto:[email protected]>> wrote: >> Ok thanks Junchao, so is GPU 0 actually allocating memory for the 8 MPI >> processes meshes but only working on 2 of them? >> It says in the script it has allocated 2.4GB >> Best, >> Marcos >> From: Junchao Zhang <[email protected] >> <mailto:[email protected]>> >> Sent: Monday, August 21, 2023 3:29 PM >> To: Vanella, Marcos (Fed) <[email protected] >> <mailto:[email protected]>> >> Cc: PETSc users list <[email protected] >> <mailto:[email protected]>>; Guan, Collin X. (Fed) >> <[email protected] <mailto:[email protected]>> >> Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi >> processes and 1 GPU >> >> Hi, Macros, >> If you look at the PIDs of the nvidia-smi output, you will only find 8 >> unique PIDs, which is expected since you allocated 8 MPI ranks per node. >> The duplicate PIDs are usually for threads spawned by the MPI runtime (for >> example, progress threads in MPI implementation). So your job script and >> output are all good. >> >> Thanks. >> >> On Mon, Aug 21, 2023 at 2:00 PM Vanella, Marcos (Fed) >> <[email protected] <mailto:[email protected]>> wrote: >> Hi Junchao, something I'm noting related to running with cuda enabled linear >> solvers (CG+HYPRE, CG+GAMG) is that for multi cpu-multi gpu calculations, >> the GPU 0 in the node is taking what seems to be all sub-matrices >> corresponding to all the MPI processes in the node. This is the result of >> the nvidia-smi command on a node with 8 MPI processes (each advancing the >> same number of unknowns in the calculation) and 4 GPU V100s: >> >> Mon Aug 21 14:36:07 2023 >> +---------------------------------------------------------------------------------------+ >> | NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA >> Version: 12.2 | >> |-----------------------------------------+----------------------+----------------------+ >> | GPU Name Persistence-M | Bus-Id Disp.A | Volatile >> Uncorr. ECC | >> | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util >> Compute M. | >> | | | >> MIG M. | >> |=========================================+======================+======================| >> | 0 Tesla V100-SXM2-16GB On | 00000004:04:00.0 Off | >> 0 | >> | N/A 34C P0 63W / 300W | 2488MiB / 16384MiB | 0% >> Default | >> | | | >> N/A | >> +-----------------------------------------+----------------------+----------------------+ >> | 1 Tesla V100-SXM2-16GB On | 00000004:05:00.0 Off | >> 0 | >> | N/A 38C P0 56W / 300W | 638MiB / 16384MiB | 0% >> Default | >> | | | >> N/A | >> +-----------------------------------------+----------------------+----------------------+ >> | 2 Tesla V100-SXM2-16GB On | 00000035:03:00.0 Off | >> 0 | >> | N/A 35C P0 52W / 300W | 638MiB / 16384MiB | 0% >> Default | >> | | | >> N/A | >> +-----------------------------------------+----------------------+----------------------+ >> | 3 Tesla V100-SXM2-16GB On | 00000035:04:00.0 Off | >> 0 | >> | N/A 38C P0 53W / 300W | 638MiB / 16384MiB | 0% >> Default | >> | | | >> N/A | >> +-----------------------------------------+----------------------+----------------------+ >> >> >> +---------------------------------------------------------------------------------------+ >> | Processes: >> | >> | GPU GI CI PID Type Process name >> GPU Memory | >> | ID ID >> Usage | >> |=======================================================================================| >> | 0 N/A N/A 214626 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux >> 318MiB | >> | 0 N/A N/A 214627 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux >> 308MiB | >> | 0 N/A N/A 214628 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux >> 308MiB | >> | 0 N/A N/A 214629 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux >> 308MiB | >> | 0 N/A N/A 214630 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux >> 318MiB | >> | 0 N/A N/A 214631 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux >> 308MiB | >> | 0 N/A N/A 214632 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux >> 308MiB | >> | 0 N/A N/A 214633 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux >> 308MiB | >> | 1 N/A N/A 214627 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux >> 318MiB | >> | 1 N/A N/A 214631 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux >> 318MiB | >> | 2 N/A N/A 214628 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux >> 318MiB | >> | 2 N/A N/A 214632 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux >> 318MiB | >> | 3 N/A N/A 214629 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux >> 318MiB | >> | 3 N/A N/A 214633 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux >> 318MiB | >> +---------------------------------------------------------------------------------------+ >> >> >> You can see that GPU 0 is connected to all 8 MPI Processes, each taking >> about 300MB on it, whereas GPUs 1,2 and 3 are working with 2 MPI Processes. >> I'm wondering if this is expected or there are some changes I need to do on >> my submission script/runtime parameters. >> This is the script in this case (2 nodes, 8 MPI processes/node, 4 GPU/node): >> >> #!/bin/bash >> # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds >> #SBATCH -J test >> #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err >> #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log >> #SBATCH --partition=gpu >> #SBATCH --ntasks=16 >> #SBATCH --ntasks-per-node=8 >> #SBATCH --cpus-per-task=1 >> #SBATCH --nodes=2 >> #SBATCH --time=01:00:00 >> #SBATCH --gres=gpu:4 >> >> export OMP_NUM_THREADS=1 >> # modules >> module load cuda/11.7 >> module load gcc/11.2.1/toolset >> module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7 >> >> cd /home/mnv/Firemodels_fork/fds/Issues/PETSc >> >> srun -N 2 -n 16 >> /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux >> test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda >> >> Thank you for the advice, >> Marcos >> >> >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments >> is infinitely more interesting than any results to which their experiments >> lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
