Yea, Stefano mentioned this and I would also like to see this not be a fatal error.
On Mon, Nov 15, 2021 at 9:26 AM Jacob Faibussowitsch <[email protected]> wrote: > > [0]PETSC ERROR: PETSc is configured with GPU support, but your MPI is >> not GPU-aware. For better performance, please use a GPU-aware MPI. >> > [0]PETSC ERROR: If you do not care, add option -use_gpu_aware_mpi 0. To >> not see the message again, add the option to your .petscrc, OR add it to >> the env var PETSC_OPTIONS. >> > [0]PETSC ERROR: If you do care, for IBM Spectrum MPI on OLCF Summit, >> you may need jsrun --smpiargs=-gpu. >> > [0]PETSC ERROR: For OpenMPI, you need to configure it --with-cuda ( >> https://www.open-mpi.org/faq/?category=buildcuda) >> > [0]PETSC ERROR: For MVAPICH2-GDR, you need to set MV2_USE_CUDA=1 ( >> http://mvapich.cse.ohio-state.edu/userguide/gdr/) >> > [0]PETSC ERROR: For Cray-MPICH, you need to set >> MPICH_RDMA_ENABLED_CUDA=1 ( >> https://www.olcf.ornl.gov/tutorials/gpudirect-mpich-enabled-cuda/) >> > > You seem to also be tripping up the gpu aware mpi checker. IIRC we > discussed removing this at some point? I think Stefano mentioned we now do > this check at configure time? > > Best regards, > > Jacob Faibussowitsch > (Jacob Fai - booss - oh - vitch) > > On Nov 13, 2021, at 22:57, Junchao Zhang <[email protected]> wrote: > > > > > On Sat, Nov 13, 2021 at 2:24 PM Mark Adams <[email protected]> wrote: > >> I have a user that wants CUDA + Hypre on Sumit and they want to use >> OpenMP in their code. I configured with openmp but without thread safety >> and got this error. >> >> Maybe there is no need for us to do anything with omp in our >> configuration. Not sure. >> >> 15:08 main= summit:/gpfs/alpine/csc314/scratch/adams/petsc$ make >> PETSC_DIR=/gpfs/alpine/world-shared/geo127/petsc/arch-opt-gcc9.1.0-omp-cuda11.0.3 >> PETSC_ARCH="" check >> Running check examples to verify correct installation >> Using >> PETSC_DIR=/gpfs/alpine/world-shared/geo127/petsc/arch-opt-gcc9.1.0-omp-cuda11.0.3 >> and PETSC_ARCH= >> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process >> Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes >> See http://www.mcs.anl.gov/petsc/documentation/faq.html >> [1] (280696) Warning: Could not find key lid0:0:2 in cache >> <========================= >> [1] (280696) Warning: Could not find key qpn0:0:0:2 in cache >> <========================= >> Unable to connect queue-pairs >> [h37n08:280696] Error: common_pami.c:1094 - ompi_common_pami_init() 1: >> Unable to create 1 PAMI communication context(s) rc=1 >> > I don't know what petsc's thread safety is. But this error seems to be in > the environment. You can report to OLCF help. > > >> -------------------------------------------------------------------------- >> No components were able to be opened in the pml framework. >> >> This typically means that either no components of this type were >> installed, or none of the installed components can be loaded. >> Sometimes this means that shared libraries required by these >> components are unable to be found/loaded. >> >> Host: h37n08 >> Framework: pml >> -------------------------------------------------------------------------- >> [h37n08:280696] PML pami cannot be selected >> 1,5c1,16 >> < lid velocity = 0.0016, prandtl # = 1., grashof # = 1. >> < 0 SNES Function norm 0.0406612 >> < 1 SNES Function norm 4.12227e-06 >> < 2 SNES Function norm 6.098e-11 >> < Number of SNES iterations = 2 >> --- >> > [1] (280721) Warning: Could not find key lid0:0:2 in cache >> <========================= >> > [1] (280721) Warning: Could not find key qpn0:0:0:2 in cache >> <========================= >> > Unable to connect queue-pairs >> > [h37n08:280721] Error: common_pami.c:1094 - ompi_common_pami_init() 1: >> Unable to create 1 PAMI communication context(s) rc=1 >> > >> -------------------------------------------------------------------------- >> > No components were able to be opened in the pml framework. >> > >> > This typically means that either no components of this type were >> > installed, or none of the installed components can be loaded. >> > Sometimes this means that shared libraries required by these >> > components are unable to be found/loaded. >> > >> > Host: h37n08 >> > Framework: pml >> > >> -------------------------------------------------------------------------- >> > [h37n08:280721] PML pami cannot be selected >> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials >> Possible problem with ex19 running with hypre, diffs above >> ========================================= >> 2,15c2,15 >> < 0 SNES Function norm 2.391552133017e-01 >> < 0 KSP Residual norm 2.325621076120e-01 >> < 1 KSP Residual norm 1.654206318674e-02 >> < 2 KSP Residual norm 7.202836119880e-04 >> < 3 KSP Residual norm 1.796861424199e-05 >> < 4 KSP Residual norm 2.461332992052e-07 >> < 1 SNES Function norm 6.826585648929e-05 >> < 0 KSP Residual norm 2.347339172985e-05 >> < 1 KSP Residual norm 8.356798075993e-07 >> < 2 KSP Residual norm 1.844045309619e-08 >> < 3 KSP Residual norm 5.336386977405e-10 >> < 4 KSP Residual norm 2.662608472862e-11 >> < 2 SNES Function norm 6.549682264799e-11 >> < Number of SNES iterations = 2 >> --- >> > [0]PETSC ERROR: PETSc is configured with GPU support, but your MPI is >> not GPU-aware. For better performance, please use a GPU-aware MPI. >> > [0]PETSC ERROR: If you do not care, add option -use_gpu_aware_mpi 0. To >> not see the message again, add the option to your .petscrc, OR add it to >> the env var PETSC_OPTIONS. >> > [0]PETSC ERROR: If you do care, for IBM Spectrum MPI on OLCF Summit, >> you may need jsrun --smpiargs=-gpu. >> > [0]PETSC ERROR: For OpenMPI, you need to configure it --with-cuda ( >> https://www.open-mpi.org/faq/?category=buildcuda) >> > [0]PETSC ERROR: For MVAPICH2-GDR, you need to set MV2_USE_CUDA=1 ( >> http://mvapich.cse.ohio-state.edu/userguide/gdr/) >> > [0]PETSC ERROR: For Cray-MPICH, you need to set >> MPICH_RDMA_ENABLED_CUDA=1 ( >> https://www.olcf.ornl.gov/tutorials/gpudirect-mpich-enabled-cuda/) >> > >> -------------------------------------------------------------------------- >> > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF >> > with errorcode 76. >> > >> > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >> > You may or may not see output from other processes, depending on >> > exactly when Open MPI kills them. >> > >> -------------------------------------------------------------------------- >> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials >> Possible problem with ex19 running with cuda, diffs above >> ========================================= >> > >
