Hello Gilles, The gromacs-2024.4 build at cmake stage shows that CUDA AWARE MPI is detected:
-- Performing Test HAVE_MPI_EXT -- Performing Test HAVE_MPI_EXT - Success -- Performing Test MPI_SUPPORTS_CUDA_AWARE_DETECTION -- Performing Test MPI_SUPPORTS_CUDA_AWARE_DETECTION - Success -- Checking for MPI_SUPPORTS_CUDA_AWARE_DETECTION - yes But during runtime it is not able to detect it. The OpenMPI MCA transports used are: --mca btl ofi --mca coll ^hcoll -x GMX_ENABLE_DIRECT_GPU_COMM=true -x PATH -x LD_LIBRARY_PATH -hostfile s_hosts2 -np ${s_nmpi} --map-by numa --bind-to numa This fails with following Seg-Fault error: -------------------------------------------------------------------------- No components were able to be opened in the btl framework. This typically means that either no components of this type were installed, or none of the installed components can be loaded. Sometimes this means that shared libraries required by these components are unable to be found/loaded. Host: g100n052 Framework: btl -------------------------------------------------------------------------- ^@[g100n052:14172:0:14172] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil)) [g100n052:14176:0:14176] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil)) But 'ompi_info' shows "btl ofi" is available. And the other notable point is, single node jobs with multiple gpus work fine, I mean on single node gromacs detects gpu aware mpi and the performance is good, as expected. On more than 1 node only, it fails with the above seg fault error. On a single node, is it using the XPMEM for communication? Is there any OpenMPI env variable to show what transport is being used for communication between GPUs and between MPI ranks? Thanks On Sun, Mar 30, 2025 at 10:21 AM Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > Sangan, > > The issue should have been fixed in Open MPI 5.0.6. > > Anyway, are you certain Open MPI is not GPU aware and this is not > cmake/GROMACS that failed to detect it? > > What if you "configure" GROMACS with > cmake -DGMX_FORCE_GPU_AWARE_MPI=ON ... > > If the problem persists, please open an issue at > https://github.com/open-mpi/ompi/issues and do provide the required > information. > > Cheers, > > Gilles > > On Sun, Mar 30, 2025 at 12:08 AM Sangam B <forum....@gmail.com> wrote: > >> Hi, >> >> OpenMPI-5.0.5 or 5.0.6 versions fail with following error during >> "make" stage of the build procedure: >> >> In file included from ../../../../../../ompi/mca/mtl/ofi/mtl_ofi.h:51, >> from ../../../../../../ompi/mca/mtl/ofi/mtl_ofi.c:13: >> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi.h: In function >> ‘ompi_mtl_ofi_context_progress’: >> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi_request.h:19:5: warning: >> implicit declaration of function ‘container_of’ >> [-Wimplicit-function-declaration] >> 19 | container_of((_ptr_ctx), struct ompi_mtl_ofi_request_t, ctx) >> | ^~~~~~~~~~~~ >> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi.h:152:27: note: in expansion >> of macro ‘TO_OFI_REQ’ >> 152 | ofi_req = >> TO_OFI_REQ(ompi_mtl_ofi_wc[i].op_context); >> | ^~~~~~~~~~ >> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi_request.h:19:30: error: >> expected expression before ‘struct’ >> 19 | container_of((_ptr_ctx), struct ompi_mtl_ofi_request_t, ctx) >> | ^~~~~~ >> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi.h:152:27: note: in expansion >> of macro ‘TO_OFI_REQ’ >> 152 | ofi_req = >> TO_OFI_REQ(ompi_mtl_ofi_wc[i].op_context); >> | ^~~~~~~~~~ >> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi_request.h:19:30: error: >> expected expression before ‘struct’ >> 19 | container_of((_ptr_ctx), struct ompi_mtl_ofi_request_t, ctx) >> | ^~~~~~ >> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi.h:200:19: note: in expansion >> of macro ‘TO_OFI_REQ’ >> 200 | ofi_req = TO_OFI_REQ(error.op_context); >> | ^~~~~~~~~~ >> make[2]: *** [Makefile:1603: mtl_ofi.lo] Error 1 >> >> OpenMPI-5.0.7 surpasses this error, but it is not able build cuda [GPU >> DIRECT] & ofi support: >> >> Gromacs applications complains that it is not able to detect Cuda Aware >> MPI: >> >> GPU-aware MPI was not detected, will not use direct GPU communication. >> Check the GROMACS install guide for recommendations for GPU-aware support. >> If you are certain about GPU-aware support in your MPI library, you can >> force its use by setting the GMX_FORCE_GPU_AWARE_MPI environment variable. >> >> OpenMPI is configured like this: >> >> '--disable-opencl' '--with-slurm' '--without-lsf' >> '--without-opencl' >> >> '--with-cuda=/opt/nvidia/hpc_sdk/Linux_x86_64/25.1/cuda/12.6' >> '--without-rocm' >> '--with-knem=/opt/knem-1.1.4.90mlnx3' >> >> '--with-xpmem=/sw/openmpi/5.0.7/g133cu126_ubu2404/xpmem/2.7.3/' >> >> '--with-xpmem-libdir=/sw/openmpi/5.0.7/g133cu126_ubu2404/xpmem/2.7.3//lib' >> >> '--with-ofi=/sw/openmpi/5.0.7/g133cu126_ubu2404/ofi/2.0.0/c126g25xu118' >> >> '--with-ofi-libdir=/sw/openmpi/5.0.7/g133cu126_ubu2404/ofi/2.0.0/c126g25xu118/lib' >> '--enable-mca-no-build=btl-usnic' >> >> Can somebody help me to build a successful cuda aware openmpi here? >> >> Thanks >> >> To unsubscribe from this group and stop receiving emails from it, send an >> email to users+unsubscr...@lists.open-mpi.org. >> > To unsubscribe from this group and stop receiving emails from it, send an > email to users+unsubscr...@lists.open-mpi.org. > To unsubscribe from this group and stop receiving emails from it, send an email to users+unsubscr...@lists.open-mpi.org.