Try running with --mca btl_base_verbose 100 — that will enable much more verbose output and hopefully show what’s going wrong.
> On Mar 31, 2025, at 8:57 AM, Sangam B <forum....@gmail.com> wrote: > > Hello Gilles, > > No it is not working on single node only: > > mpirun --mca pml ob1 --mca btl ofi -np 4 test_ompi507stub.exe > -------------------------------------------------------------------------- > No components were able to be opened in the btl framework. > > This typically means that either no components of this type were > installed, or none of the installed components can be loaded. > Sometimes this means that shared libraries required by these > components are unable to be found/loaded. > > Host: g100n052 > Framework: btl > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > MPI_INIT has failed because at least one MPI process is unreachable > from another. This *usually* means that an underlying communication > plugin -- such as a BTL or an MTL -- has either not loaded or not > allowed itself to be used. Your MPI job will now abort. > > You may wish to try to narrow down the problem; > > * Check the output of ompi_info to see which BTL/MTL plugins are > available. > * Run your application with MPI_THREAD_SINGLE. > * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose, > if using MTL-based communications) to see exactly which > communication plugins were considered and/or discarded. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > ompi_mpi_init: ompi_mpi_instance_init failed > --> Returned "Unreachable" (-12) instead of "Success" (0) > -------------------------------------------------------------------------- > [g100n052:00000] *** An error occurred in MPI_Init > [g100n052:00000] *** reported by process [901316609,0] > [g100n052:00000] *** on a NULL communicator > [g100n052:00000] *** Unknown error > [g100n052:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator > will now abort, > [g100n052:00000] *** and MPI will try to terminate your MPI job as well) > -------------------------------------------------------------------------- > prterun has exited due to process rank 0 with PID 0 on node g100n052 calling > "abort". This may have caused other processes in the application to be > terminated by signals sent by prterun (as reported here). > -------------------------------------------------------------------------- > > ompi_info shows: > > MCA accelerator: null (MCA v2.1.0, API v1.0.0, Component v5.0.7) > MCA accelerator: cuda (MCA v2.1.0, API v1.0.0, Component v5.0.7) > MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component v5.0.7) > MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component v5.0.7) > MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component v5.0.7) > MCA btl: self (MCA v2.1.0, API v3.3.0, Component v5.0.7) > MCA btl: ofi (MCA v2.1.0, API v3.3.0, Component v5.0.7) > MCA btl: sm (MCA v2.1.0, API v3.3.0, Component v5.0.7) > MCA btl: tcp (MCA v2.1.0, API v3.3.0, Component v5.0.7) > MCA btl: uct (MCA v2.1.0, API v3.3.0, Component v5.0.7) > MCA btl: smcuda (MCA v2.1.0, API v3.3.0, Component v5.0.7) > MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component v5.0.7) > MCA if: linux_ipv6 (MCA v2.1.0, API v2.0.0, Component > v5.0.7) > MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component > v5.0.7) > MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v5.0.7) > MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component v5.0.7) > MCA memory: patcher (MCA v2.1.0, API v2.0.0, Component v5.0.7) > MCA mpool: hugepage (MCA v2.1.0, API v3.1.0, Component v5.0.7) > MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component > v5.0.7) > MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component v5.0.7) > MCA rcache: gpusm (MCA v2.1.0, API v3.3.0, Component v5.0.7) > MCA rcache: rgpusm (MCA v2.1.0, API v3.3.0, Component v5.0.7) > MCA reachable: weighted (MCA v2.1.0, API v2.0.0, Component v5.0.7) > MCA reachable: netlink (MCA v2.1.0, API v2.0.0, Component v5.0.7) > MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v5.0.7) > MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component v5.0.7) > MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v5.0.7) > MCA smsc: cma (MCA v2.1.0, API v1.0.0, Component v5.0.7) > MCA smsc: knem (MCA v2.1.0, API v1.0.0, Component v5.0.7) > MCA smsc: xpmem (MCA v2.1.0, API v1.0.0, Component v5.0.7) > MCA threads: pthreads (MCA v2.1.0, API v1.0.0, Component v5.0.7) > MCA timer: linux (MCA v2.1.0, API v2.0.0, Component v5.0.7) > MCA bml: r2 (MCA v2.1.0, API v2.1.0, Component v5.0.7) > MCA coll: adapt (MCA v2.1.0, API v2.4.0, Component v5.0.7) > MCA coll: basic (MCA v2.1.0, API v2.4.0, Component v5.0.7) > MCA coll: han (MCA v2.1.0, API v2.4.0, Component v5.0.7) > MCA coll: inter (MCA v2.1.0, API v2.4.0, Component v5.0.7) > MCA coll: libnbc (MCA v2.1.0, API v2.4.0, Component v5.0.7) > MCA coll: self (MCA v2.1.0, API v2.4.0, Component v5.0.7) > MCA coll: sync (MCA v2.1.0, API v2.4.0, Component v5.0.7) > MCA coll: tuned (MCA v2.1.0, API v2.4.0, Component v5.0.7) > MCA coll: cuda (MCA v2.1.0, API v2.4.0, Component v5.0.7) > MCA coll: ftagree (MCA v2.1.0, API v2.4.0, Component v5.0.7) > MCA coll: hcoll (MCA v2.1.0, API v2.4.0, Component v5.0.7) > MCA coll: monitoring (MCA v2.1.0, API v2.4.0, Component > v5.0.7) > MCA coll: sm (MCA v2.1.0, API v2.4.0, Component v5.0.7) > MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v5.0.7) > MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component v5.0.7) > MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0, Component > v5.0.7) > MCA fcoll: individual (MCA v2.1.0, API v2.0.0, Component > v5.0.7) > MCA fcoll: vulcan (MCA v2.1.0, API v2.0.0, Component v5.0.7) > MCA fs: lustre (MCA v2.1.0, API v2.0.0, Component v5.0.7) > MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v5.0.7) > MCA hook: comm_method (MCA v2.1.0, API v1.0.0, Component > v5.0.7) > MCA io: ompio (MCA v2.1.0, API v2.0.0, Component v5.0.7) > MCA io: romio341 (MCA v2.1.0, API v2.0.0, Component v5.0.7) > MCA mtl: ofi (MCA v2.1.0, API v2.0.0, Component v5.0.7) > MCA op: avx (MCA v2.1.0, API v1.0.0, Component v5.0.7) > MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v5.0.7) > MCA osc: monitoring (MCA v2.1.0, API v3.0.0, Component > v5.0.7) > MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v5.0.7) > MCA osc: ucx (MCA v2.1.0, API v3.0.0, Component v5.0.7) > MCA part: persist (MCA v2.1.0, API v4.0.0, Component v5.0.7) > MCA pml: cm (MCA v2.1.0, API v2.1.0, Component v5.0.7) > MCA pml: monitoring (MCA v2.1.0, API v2.1.0, Component > v5.0.7) > MCA pml: ob1 (MCA v2.1.0, API v2.1.0, Component v5.0.7) > MCA pml: ucx (MCA v2.1.0, API v2.1.0, Component v5.0.7) > MCA pml: v (MCA v2.1.0, API v2.1.0, Component v5.0.7) > MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component > v5.0.7) > MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component > v5.0.7) > MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v5.0.7) > MCA topo: basic (MCA v2.1.0, API v2.2.0, Component v5.0.7) > MCA topo: treematch (MCA v2.1.0, API v2.2.0, Component > v5.0.7) > MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component > v5.0.7) > > OpenMPI is configured like this: > > Configure command line: > '--prefix=/sw/openmpi/5.0.7/g133cu126stubU2404/xp_minu118ofi2' > '--without-lsf' > > '--with-cuda=/opt/nvidia/hpc_sdk/Linux_x86_64/25.1/cuda/12.6' > > '--with-cuda-libdir=/opt/nvidia/hpc_sdk/Linux_x86_64/25.1/cuda/12.6/targets/x86_64-linux/lib/stubs' > '--with-knem=/opt/knem-1.1.4.90mlnx3' > > '--with-xpmem=/sw/openmpi/5.0.7/g133cu126stubU2404/xpmem/2.7.3/' > > '--with-xpmem-libdir=/sw/openmpi/5.0.7/g133cu126stubU2404/xpmem/2.7.3//lib' > > '--with-ofi=/sw/openmpi/5.0.7/g133cu126stubU2404/ofi/2.0.0/c126g25xu118' > > '--with-ofi-libdir=/sw/openmpi/5.0.7/g133cu126stubU2404/ofi/2.0.0/c126g25xu118/lib' > '--enable-mca-no-build=btl-usnic' > > UCX-1.18.0 is configured like this: > > ../../configure --prefix=${s_pfix} \ > --enable-mt \ > --without-rocm \ > --with-cuda=${cuda_path} \ > --with-knem=${knem_path} \ > --with-xpmem=${xpmem_path} > > OFI [libfabric-2.0.0] is configured: > > ./configure \ > --prefix=${s_pfix} \ > --enable-shm=dl \ > --enable-sockets=dl \ > --enable-udp=dl \ > --enable-tcp=dl \ > --enable-rxm=dl \ > --enable-rxd=dl \ > --enable-verbs=dl \ > --enable-psm2=no \ > --enable-psm3=no \ > --enable-ucx=dl:${ucx_path} \ > --enable-gdrcopy-dlopen --with-gdrcopy=/usr \ > --enable-cuda-dlopen --with-cuda=${cuda_path} \ > --enable-xpmem=${xpmem_path} > config.out > > With some verbose settings, It is not able to initialize btl component ofi: > > [g100n052:24169] mca: base: components_register: component ofi register > function successful > [g100n052:24169] mca: base: components_open: opening btl components > [g100n052:24169] mca: base: components_open: found loaded component ofi > [g100n052:24169] mca: base: components_open: component ofi open function > successful > [g100n052:24172] select: initializing btl component ofi > [g100n052:24171] select: initializing btl component ofi > [g100n052:24170] select: initializing btl component ofi > [g100n052:24169] select: initializing btl component ofi > [g100n052:24172] select: init of component ofi returned failure > [g100n052:24171] select: init of component ofi returned failure > [g100n052:24169] select: init of component ofi returned failure > [g100n052:24170] select: init of component ofi returned failure > [g100n052:24172] mca: base: close: component ofi closed > > > > Thanks > > > > On Mon, Mar 31, 2025 at 12:25 PM Gilles Gouaillardet > <gilles.gouaillar...@gmail.com <mailto:gilles.gouaillar...@gmail.com>> wrote: >> Sangam, >> >> What if you run a simple MPI hello world program with >> mpirun --mca pml ob1 --mca btl ofi ... >> >> on one and several nodes? >> >> Cheers, >> >> Gilles >> >> On Mon, Mar 31, 2025 at 3:48 PM Sangam B <forum....@gmail.com >> <mailto:forum....@gmail.com>> wrote: >>> Hello Gilles, >>> >>> The gromacs-2024.4 build at cmake stage shows that CUDA AWARE MPI is >>> detected: >>> >>> -- Performing Test HAVE_MPI_EXT >>> -- Performing Test HAVE_MPI_EXT - Success >>> -- Performing Test MPI_SUPPORTS_CUDA_AWARE_DETECTION >>> -- Performing Test MPI_SUPPORTS_CUDA_AWARE_DETECTION - Success >>> -- Checking for MPI_SUPPORTS_CUDA_AWARE_DETECTION - yes >>> >>> But during runtime it is not able to detect it. >>> >>> The OpenMPI MCA transports used are: >>> >>> --mca btl ofi --mca coll ^hcoll -x GMX_ENABLE_DIRECT_GPU_COMM=true -x >>> PATH -x LD_LIBRARY_PATH -hostfile s_hosts2 -np ${s_nmpi} --map-by numa >>> --bind-to numa >>> >>> This fails with following Seg-Fault error: >>> >>> -------------------------------------------------------------------------- >>> No components were able to be opened in the btl framework. >>> >>> This typically means that either no components of this type were >>> installed, or none of the installed components can be loaded. >>> Sometimes this means that shared libraries required by these >>> components are unable to be found/loaded. >>> >>> Host: g100n052 >>> Framework: btl >>> -------------------------------------------------------------------------- >>> ^@[g100n052:14172:0:14172] Caught signal 11 (Segmentation fault: address >>> not mapped to object at address (nil)) >>> [g100n052:14176:0:14176] Caught signal 11 (Segmentation fault: address not >>> mapped to object at address (nil)) >>> >>> >>> But 'ompi_info' shows "btl ofi" is available. >>> >>> And the other notable point is, single node jobs with multiple gpus work >>> fine, I mean on single node gromacs detects gpu aware mpi and the >>> performance is good, as expected. >>> >>> On more than 1 node only, it fails with the above seg fault error. >>> >>> On a single node, is it using the XPMEM for communication? >>> >>> Is there any OpenMPI env variable to show what transport is being used for >>> communication between GPUs and between MPI ranks? >>> >>> Thanks >>> >>> On Sun, Mar 30, 2025 at 10:21 AM Gilles Gouaillardet >>> <gilles.gouaillar...@gmail.com <mailto:gilles.gouaillar...@gmail.com>> >>> wrote: >>>> Sangan, >>>> >>>> The issue should have been fixed in Open MPI 5.0.6. >>>> >>>> Anyway, are you certain Open MPI is not GPU aware and this is not >>>> cmake/GROMACS that failed to detect it? >>>> >>>> What if you "configure" GROMACS with >>>> cmake -DGMX_FORCE_GPU_AWARE_MPI=ON ... >>>> >>>> If the problem persists, please open an issue at >>>> https://github.com/open-mpi/ompi/issues and do provide the required >>>> information. >>>> >>>> Cheers, >>>> >>>> Gilles >>>> >>>> On Sun, Mar 30, 2025 at 12:08 AM Sangam B <forum....@gmail.com >>>> <mailto:forum....@gmail.com>> wrote: >>>>> Hi, >>>>> >>>>> OpenMPI-5.0.5 or 5.0.6 versions fail with following error during >>>>> "make" stage of the build procedure: >>>>> >>>>> In file included from ../../../../../../ompi/mca/mtl/ofi/mtl_ofi.h:51, >>>>> from ../../../../../../ompi/mca/mtl/ofi/mtl_ofi.c:13: >>>>> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi.h: In function >>>>> ‘ompi_mtl_ofi_context_progress’: >>>>> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi_request.h:19:5: warning: >>>>> implicit declaration of function ‘container_of’ >>>>> [-Wimplicit-function-declaration] >>>>> 19 | container_of((_ptr_ctx), struct ompi_mtl_ofi_request_t, ctx) >>>>> | ^~~~~~~~~~~~ >>>>> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi.h:152:27: note: in expansion >>>>> of macro ‘TO_OFI_REQ’ >>>>> 152 | ofi_req = >>>>> TO_OFI_REQ(ompi_mtl_ofi_wc[i].op_context); >>>>> | ^~~~~~~~~~ >>>>> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi_request.h:19:30: error: >>>>> expected expression before ‘struct’ >>>>> 19 | container_of((_ptr_ctx), struct ompi_mtl_ofi_request_t, ctx) >>>>> | ^~~~~~ >>>>> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi.h:152:27: note: in expansion >>>>> of macro ‘TO_OFI_REQ’ >>>>> 152 | ofi_req = >>>>> TO_OFI_REQ(ompi_mtl_ofi_wc[i].op_context); >>>>> | ^~~~~~~~~~ >>>>> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi_request.h:19:30: error: >>>>> expected expression before ‘struct’ >>>>> 19 | container_of((_ptr_ctx), struct ompi_mtl_ofi_request_t, ctx) >>>>> | ^~~~~~ >>>>> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi.h:200:19: note: in expansion >>>>> of macro ‘TO_OFI_REQ’ >>>>> 200 | ofi_req = TO_OFI_REQ(error.op_context); >>>>> | ^~~~~~~~~~ >>>>> make[2]: *** [Makefile:1603: mtl_ofi.lo] Error 1 >>>>> >>>>> OpenMPI-5.0.7 surpasses this error, but it is not able build cuda [GPU >>>>> DIRECT] & ofi support: >>>>> >>>>> Gromacs applications complains that it is not able to detect Cuda Aware >>>>> MPI: >>>>> >>>>> GPU-aware MPI was not detected, will not use direct GPU communication. >>>>> Check the GROMACS install guide for recommendations for GPU-aware >>>>> support. If you are certain about GPU-aware support in your MPI library, >>>>> you can force its use by setting the GMX_FORCE_GPU_AWARE_MPI environment >>>>> variable. >>>>> >>>>> OpenMPI is configured like this: >>>>> >>>>> '--disable-opencl' '--with-slurm' '--without-lsf' >>>>> '--without-opencl' >>>>> >>>>> '--with-cuda=/opt/nvidia/hpc_sdk/Linux_x86_64/25.1/cuda/12.6' >>>>> '--without-rocm' >>>>> '--with-knem=/opt/knem-1.1.4.90mlnx3' >>>>> >>>>> '--with-xpmem=/sw/openmpi/5.0.7/g133cu126_ubu2404/xpmem/2.7.3/' >>>>> >>>>> '--with-xpmem-libdir=/sw/openmpi/5.0.7/g133cu126_ubu2404/xpmem/2.7.3//lib' >>>>> >>>>> '--with-ofi=/sw/openmpi/5.0.7/g133cu126_ubu2404/ofi/2.0.0/c126g25xu118' >>>>> >>>>> '--with-ofi-libdir=/sw/openmpi/5.0.7/g133cu126_ubu2404/ofi/2.0.0/c126g25xu118/lib' >>>>> '--enable-mca-no-build=btl-usnic' >>>>> >>>>> Can somebody help me to build a successful cuda aware openmpi here? >>>>> >>>>> Thanks >>>>> >>>>> To unsubscribe from this group and stop receiving emails from it, send an >>>>> email to users+unsubscr...@lists.open-mpi.org >>>>> <mailto:users+unsubscr...@lists.open-mpi.org>. >>>> >>>> >>>> To unsubscribe from this group and stop receiving emails from it, send an >>>> email to users+unsubscr...@lists.open-mpi.org >>>> <mailto:users+unsubscr...@lists.open-mpi.org>. >>> >>> >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to users+unsubscr...@lists.open-mpi.org >>> <mailto:users+unsubscr...@lists.open-mpi.org>. >> >> >> To unsubscribe from this group and stop receiving emails from it, send an >> email to users+unsubscr...@lists.open-mpi.org >> <mailto:users+unsubscr...@lists.open-mpi.org>. > > > To unsubscribe from this group and stop receiving emails from it, send an > email to users+unsubscr...@lists.open-mpi.org > <mailto:users+unsubscr...@lists.open-mpi.org>. -- {+} Jeff Squyres To unsubscribe from this group and stop receiving emails from it, send an email to users+unsubscr...@lists.open-mpi.org.