Try running with --mca btl_base_verbose 100 — that will enable much more 
verbose output and hopefully show what’s going wrong.


> On Mar 31, 2025, at 8:57 AM, Sangam B <forum....@gmail.com> wrote:
> 
> Hello Gilles,
> 
> No it is not working on single node only:
> 
> mpirun  --mca pml ob1 --mca btl ofi   -np 4 test_ompi507stub.exe
> --------------------------------------------------------------------------
> No components were able to be opened in the btl framework.
> 
> This typically means that either no components of this type were
> installed, or none of the installed components can be loaded.
> Sometimes this means that shared libraries required by these
> components are unable to be found/loaded.
> 
>   Host:      g100n052
>   Framework: btl
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> MPI_INIT has failed because at least one MPI process is unreachable
> from another.  This *usually* means that an underlying communication
> plugin -- such as a BTL or an MTL -- has either not loaded or not
> allowed itself to be used.  Your MPI job will now abort.
> 
> You may wish to try to narrow down the problem;
> 
>  * Check the output of ompi_info to see which BTL/MTL plugins are
>    available.
>  * Run your application with MPI_THREAD_SINGLE.
>  * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
>    if using MTL-based communications) to see exactly which
>    communication plugins were considered and/or discarded.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
> 
>   ompi_mpi_init: ompi_mpi_instance_init failed
>   --> Returned "Unreachable" (-12) instead of "Success" (0)
> --------------------------------------------------------------------------
> [g100n052:00000] *** An error occurred in MPI_Init
> [g100n052:00000] *** reported by process [901316609,0]
> [g100n052:00000] *** on a NULL communicator
> [g100n052:00000] *** Unknown error
> [g100n052:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator 
> will now abort,
> [g100n052:00000] ***    and MPI will try to terminate your MPI job as well)
> --------------------------------------------------------------------------
> prterun has exited due to process rank 0 with PID 0 on node g100n052 calling
> "abort". This may have caused other processes in the application to be
> terminated by signals sent by prterun (as reported here).
> --------------------------------------------------------------------------
> 
> ompi_info shows:
> 
>         MCA accelerator: null (MCA v2.1.0, API v1.0.0, Component v5.0.7)
>          MCA accelerator: cuda (MCA v2.1.0, API v1.0.0, Component v5.0.7)
>            MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component v5.0.7)
>            MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component v5.0.7)
>            MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component v5.0.7)
>                  MCA btl: self (MCA v2.1.0, API v3.3.0, Component v5.0.7)
>                  MCA btl: ofi (MCA v2.1.0, API v3.3.0, Component v5.0.7)
>                  MCA btl: sm (MCA v2.1.0, API v3.3.0, Component v5.0.7)
>                  MCA btl: tcp (MCA v2.1.0, API v3.3.0, Component v5.0.7)
>                  MCA btl: uct (MCA v2.1.0, API v3.3.0, Component v5.0.7)
>                  MCA btl: smcuda (MCA v2.1.0, API v3.3.0, Component v5.0.7)
>                   MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component v5.0.7)
>                   MCA if: linux_ipv6 (MCA v2.1.0, API v2.0.0, Component
>                           v5.0.7)
>                   MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component
>                           v5.0.7)
>          MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v5.0.7)
>          MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component v5.0.7)
>               MCA memory: patcher (MCA v2.1.0, API v2.0.0, Component v5.0.7)
>                MCA mpool: hugepage (MCA v2.1.0, API v3.1.0, Component v5.0.7)
>              MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component
>                           v5.0.7)
>               MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component v5.0.7)
>               MCA rcache: gpusm (MCA v2.1.0, API v3.3.0, Component v5.0.7)
>               MCA rcache: rgpusm (MCA v2.1.0, API v3.3.0, Component v5.0.7)
>            MCA reachable: weighted (MCA v2.1.0, API v2.0.0, Component v5.0.7)
>            MCA reachable: netlink (MCA v2.1.0, API v2.0.0, Component v5.0.7)
>                MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v5.0.7)
>                MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component v5.0.7)
>                MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v5.0.7)
>                 MCA smsc: cma (MCA v2.1.0, API v1.0.0, Component v5.0.7)
>                 MCA smsc: knem (MCA v2.1.0, API v1.0.0, Component v5.0.7)
>                 MCA smsc: xpmem (MCA v2.1.0, API v1.0.0, Component v5.0.7)
>              MCA threads: pthreads (MCA v2.1.0, API v1.0.0, Component v5.0.7)
>                MCA timer: linux (MCA v2.1.0, API v2.0.0, Component v5.0.7)
>                  MCA bml: r2 (MCA v2.1.0, API v2.1.0, Component v5.0.7)
>                 MCA coll: adapt (MCA v2.1.0, API v2.4.0, Component v5.0.7)
>                 MCA coll: basic (MCA v2.1.0, API v2.4.0, Component v5.0.7)
>                 MCA coll: han (MCA v2.1.0, API v2.4.0, Component v5.0.7)
>                 MCA coll: inter (MCA v2.1.0, API v2.4.0, Component v5.0.7)
>                 MCA coll: libnbc (MCA v2.1.0, API v2.4.0, Component v5.0.7)
>                 MCA coll: self (MCA v2.1.0, API v2.4.0, Component v5.0.7)
>                 MCA coll: sync (MCA v2.1.0, API v2.4.0, Component v5.0.7)
>                 MCA coll: tuned (MCA v2.1.0, API v2.4.0, Component v5.0.7)
>                 MCA coll: cuda (MCA v2.1.0, API v2.4.0, Component v5.0.7)
>                 MCA coll: ftagree (MCA v2.1.0, API v2.4.0, Component v5.0.7)
>                 MCA coll: hcoll (MCA v2.1.0, API v2.4.0, Component v5.0.7)
>                 MCA coll: monitoring (MCA v2.1.0, API v2.4.0, Component
>                           v5.0.7)
>                 MCA coll: sm (MCA v2.1.0, API v2.4.0, Component v5.0.7)
>                 MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v5.0.7)
>                MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component v5.0.7)
>                MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0, Component
>                           v5.0.7)
>                MCA fcoll: individual (MCA v2.1.0, API v2.0.0, Component
>                           v5.0.7)
>                MCA fcoll: vulcan (MCA v2.1.0, API v2.0.0, Component v5.0.7)
>                   MCA fs: lustre (MCA v2.1.0, API v2.0.0, Component v5.0.7)
>                   MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v5.0.7)
>                 MCA hook: comm_method (MCA v2.1.0, API v1.0.0, Component
>                           v5.0.7)
>                   MCA io: ompio (MCA v2.1.0, API v2.0.0, Component v5.0.7)
>                   MCA io: romio341 (MCA v2.1.0, API v2.0.0, Component v5.0.7)
>                  MCA mtl: ofi (MCA v2.1.0, API v2.0.0, Component v5.0.7)
>                   MCA op: avx (MCA v2.1.0, API v1.0.0, Component v5.0.7)
>                  MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v5.0.7)
>                  MCA osc: monitoring (MCA v2.1.0, API v3.0.0, Component
>                           v5.0.7)
>                  MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v5.0.7)
>                  MCA osc: ucx (MCA v2.1.0, API v3.0.0, Component v5.0.7)
>                 MCA part: persist (MCA v2.1.0, API v4.0.0, Component v5.0.7)
>                  MCA pml: cm (MCA v2.1.0, API v2.1.0, Component v5.0.7)
>                  MCA pml: monitoring (MCA v2.1.0, API v2.1.0, Component
>                           v5.0.7)
>                  MCA pml: ob1 (MCA v2.1.0, API v2.1.0, Component v5.0.7)
>                  MCA pml: ucx (MCA v2.1.0, API v2.1.0, Component v5.0.7)
>                  MCA pml: v (MCA v2.1.0, API v2.1.0, Component v5.0.7)
>             MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component
>                           v5.0.7)
>             MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component
>                           v5.0.7)
>             MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v5.0.7)
>                 MCA topo: basic (MCA v2.1.0, API v2.2.0, Component v5.0.7)
>                 MCA topo: treematch (MCA v2.1.0, API v2.2.0, Component
>                           v5.0.7)
>            MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component
>                           v5.0.7)
> 
> OpenMPI is configured like this:
> 
>   Configure command line: 
> '--prefix=/sw/openmpi/5.0.7/g133cu126stubU2404/xp_minu118ofi2'
>                           '--without-lsf'
>                           
> '--with-cuda=/opt/nvidia/hpc_sdk/Linux_x86_64/25.1/cuda/12.6'
>                           
> '--with-cuda-libdir=/opt/nvidia/hpc_sdk/Linux_x86_64/25.1/cuda/12.6/targets/x86_64-linux/lib/stubs'
>                           '--with-knem=/opt/knem-1.1.4.90mlnx3'
>                           
> '--with-xpmem=/sw/openmpi/5.0.7/g133cu126stubU2404/xpmem/2.7.3/'
>                           
> '--with-xpmem-libdir=/sw/openmpi/5.0.7/g133cu126stubU2404/xpmem/2.7.3//lib'
>                           
> '--with-ofi=/sw/openmpi/5.0.7/g133cu126stubU2404/ofi/2.0.0/c126g25xu118'
>                           
> '--with-ofi-libdir=/sw/openmpi/5.0.7/g133cu126stubU2404/ofi/2.0.0/c126g25xu118/lib'
>                           '--enable-mca-no-build=btl-usnic'
> 
> UCX-1.18.0 is configured like this:
> 
> ../../configure --prefix=${s_pfix} \
>         --enable-mt \
>         --without-rocm \
>         --with-cuda=${cuda_path} \
>         --with-knem=${knem_path} \
>         --with-xpmem=${xpmem_path} 
> 
> OFI [libfabric-2.0.0] is configured:
> 
> ./configure \
>         --prefix=${s_pfix} \
>         --enable-shm=dl \
>         --enable-sockets=dl \
>         --enable-udp=dl \
>         --enable-tcp=dl \
>         --enable-rxm=dl \
>         --enable-rxd=dl \
>         --enable-verbs=dl \
>         --enable-psm2=no \
>         --enable-psm3=no \
>         --enable-ucx=dl:${ucx_path} \
>         --enable-gdrcopy-dlopen --with-gdrcopy=/usr \
>        --enable-cuda-dlopen --with-cuda=${cuda_path} \
>         --enable-xpmem=${xpmem_path} > config.out
> 
> With some verbose settings, It is not able to initialize btl component ofi:
> 
> [g100n052:24169] mca: base: components_register: component ofi register 
> function successful
> [g100n052:24169] mca: base: components_open: opening btl components
> [g100n052:24169] mca: base: components_open: found loaded component ofi
> [g100n052:24169] mca: base: components_open: component ofi open function 
> successful
> [g100n052:24172] select: initializing btl component ofi
> [g100n052:24171] select: initializing btl component ofi
> [g100n052:24170] select: initializing btl component ofi
> [g100n052:24169] select: initializing btl component ofi
> [g100n052:24172] select: init of component ofi returned failure
> [g100n052:24171] select: init of component ofi returned failure
> [g100n052:24169] select: init of component ofi returned failure
> [g100n052:24170] select: init of component ofi returned failure
> [g100n052:24172] mca: base: close: component ofi closed
> 
> 
> 
> Thanks
> 
> 
> 
> On Mon, Mar 31, 2025 at 12:25 PM Gilles Gouaillardet 
> <gilles.gouaillar...@gmail.com <mailto:gilles.gouaillar...@gmail.com>> wrote:
>> Sangam,
>> 
>> What if you run a simple MPI hello world program with
>> mpirun --mca pml ob1 --mca btl ofi ...
>> 
>> on one and several nodes?
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> On Mon, Mar 31, 2025 at 3:48 PM Sangam B <forum....@gmail.com 
>> <mailto:forum....@gmail.com>> wrote:
>>> Hello Gilles,
>>> 
>>> The gromacs-2024.4 build at cmake stage shows that CUDA AWARE MPI is 
>>> detected:
>>> 
>>> -- Performing Test HAVE_MPI_EXT
>>> -- Performing Test HAVE_MPI_EXT - Success
>>> -- Performing Test MPI_SUPPORTS_CUDA_AWARE_DETECTION
>>> -- Performing Test MPI_SUPPORTS_CUDA_AWARE_DETECTION - Success
>>> -- Checking for MPI_SUPPORTS_CUDA_AWARE_DETECTION - yes
>>> 
>>> But during runtime it is not able to detect it.
>>> 
>>> The OpenMPI MCA transports used are:
>>> 
>>>  --mca btl ofi  --mca coll ^hcoll -x GMX_ENABLE_DIRECT_GPU_COMM=true -x 
>>> PATH -x LD_LIBRARY_PATH  -hostfile s_hosts2 -np ${s_nmpi} --map-by numa 
>>> --bind-to numa
>>> 
>>> This fails with following Seg-Fault error:
>>> 
>>> --------------------------------------------------------------------------
>>> No components were able to be opened in the btl framework.
>>> 
>>> This typically means that either no components of this type were
>>> installed, or none of the installed components can be loaded.
>>> Sometimes this means that shared libraries required by these
>>> components are unable to be found/loaded.
>>> 
>>>   Host:      g100n052
>>>   Framework: btl
>>> --------------------------------------------------------------------------
>>> ^@[g100n052:14172:0:14172] Caught signal 11 (Segmentation fault: address 
>>> not mapped to object at address (nil))
>>> [g100n052:14176:0:14176] Caught signal 11 (Segmentation fault: address not 
>>> mapped to object at address (nil))
>>> 
>>> 
>>> But 'ompi_info' shows "btl ofi" is available.
>>> 
>>> And the other notable point is, single node jobs with multiple gpus work 
>>> fine, I mean on single node gromacs detects gpu aware mpi and the 
>>> performance is good, as expected. 
>>> 
>>> On more than 1 node only, it fails with the above seg fault error.
>>> 
>>> On a single node, is it using the XPMEM for communication?
>>> 
>>> Is there any OpenMPI env variable to show what transport is being used for 
>>> communication between GPUs and between MPI ranks?
>>> 
>>> Thanks
>>> 
>>> On Sun, Mar 30, 2025 at 10:21 AM Gilles Gouaillardet 
>>> <gilles.gouaillar...@gmail.com <mailto:gilles.gouaillar...@gmail.com>> 
>>> wrote:
>>>> Sangan,
>>>> 
>>>> The issue should have been fixed in Open MPI 5.0.6.
>>>> 
>>>> Anyway, are you certain Open MPI is not GPU aware and this is not 
>>>> cmake/GROMACS that failed to detect it?
>>>> 
>>>> What if you "configure" GROMACS with
>>>> cmake -DGMX_FORCE_GPU_AWARE_MPI=ON ...
>>>> 
>>>> If the problem persists, please open an issue at 
>>>> https://github.com/open-mpi/ompi/issues and do provide the required 
>>>> information.
>>>> 
>>>> Cheers,
>>>> 
>>>> Gilles
>>>> 
>>>> On Sun, Mar 30, 2025 at 12:08 AM Sangam B <forum....@gmail.com 
>>>> <mailto:forum....@gmail.com>> wrote:
>>>>> Hi,
>>>>> 
>>>>>        OpenMPI-5.0.5 or 5.0.6 versions fail with following error during 
>>>>> "make" stage of the build procedure:
>>>>> 
>>>>> In file included from ../../../../../../ompi/mca/mtl/ofi/mtl_ofi.h:51,
>>>>>                  from ../../../../../../ompi/mca/mtl/ofi/mtl_ofi.c:13:
>>>>> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi.h: In function 
>>>>> ‘ompi_mtl_ofi_context_progress’:
>>>>> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi_request.h:19:5: warning: 
>>>>> implicit declaration of function ‘container_of’ 
>>>>> [-Wimplicit-function-declaration]
>>>>>    19 |     container_of((_ptr_ctx), struct ompi_mtl_ofi_request_t, ctx)
>>>>>       |     ^~~~~~~~~~~~
>>>>> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi.h:152:27: note: in expansion 
>>>>> of macro ‘TO_OFI_REQ’
>>>>>   152 |                 ofi_req = 
>>>>> TO_OFI_REQ(ompi_mtl_ofi_wc[i].op_context);
>>>>>       |                           ^~~~~~~~~~
>>>>> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi_request.h:19:30: error: 
>>>>> expected expression before ‘struct’
>>>>>    19 |     container_of((_ptr_ctx), struct ompi_mtl_ofi_request_t, ctx)
>>>>>       |                              ^~~~~~
>>>>> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi.h:152:27: note: in expansion 
>>>>> of macro ‘TO_OFI_REQ’
>>>>>   152 |                 ofi_req = 
>>>>> TO_OFI_REQ(ompi_mtl_ofi_wc[i].op_context);
>>>>>       |                           ^~~~~~~~~~
>>>>> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi_request.h:19:30: error: 
>>>>> expected expression before ‘struct’
>>>>>    19 |     container_of((_ptr_ctx), struct ompi_mtl_ofi_request_t, ctx)
>>>>>       |                              ^~~~~~
>>>>> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi.h:200:19: note: in expansion 
>>>>> of macro ‘TO_OFI_REQ’
>>>>>   200 |         ofi_req = TO_OFI_REQ(error.op_context);
>>>>>       |                   ^~~~~~~~~~
>>>>> make[2]: *** [Makefile:1603: mtl_ofi.lo] Error 1
>>>>> 
>>>>> OpenMPI-5.0.7 surpasses this error, but it is not able build cuda [GPU 
>>>>> DIRECT] & ofi support:
>>>>> 
>>>>> Gromacs applications complains that it is not able to detect Cuda Aware 
>>>>> MPI:
>>>>> 
>>>>> GPU-aware MPI was not detected, will not use direct GPU communication. 
>>>>> Check the GROMACS install guide for recommendations for GPU-aware 
>>>>> support. If you are certain about GPU-aware support in your MPI library, 
>>>>> you can force its use by setting the GMX_FORCE_GPU_AWARE_MPI environment 
>>>>> variable.
>>>>> 
>>>>> OpenMPI is configured like this:
>>>>> 
>>>>> '--disable-opencl' '--with-slurm' '--without-lsf'
>>>>>                           '--without-opencl'
>>>>>                           
>>>>> '--with-cuda=/opt/nvidia/hpc_sdk/Linux_x86_64/25.1/cuda/12.6'
>>>>>                           '--without-rocm'
>>>>>                           '--with-knem=/opt/knem-1.1.4.90mlnx3'
>>>>>                           
>>>>> '--with-xpmem=/sw/openmpi/5.0.7/g133cu126_ubu2404/xpmem/2.7.3/'
>>>>>                           
>>>>> '--with-xpmem-libdir=/sw/openmpi/5.0.7/g133cu126_ubu2404/xpmem/2.7.3//lib'
>>>>>                           
>>>>> '--with-ofi=/sw/openmpi/5.0.7/g133cu126_ubu2404/ofi/2.0.0/c126g25xu118'
>>>>>                           
>>>>> '--with-ofi-libdir=/sw/openmpi/5.0.7/g133cu126_ubu2404/ofi/2.0.0/c126g25xu118/lib'
>>>>>                           '--enable-mca-no-build=btl-usnic'
>>>>> 
>>>>> Can somebody help me to build a successful cuda aware openmpi here?
>>>>> 
>>>>> Thanks
>>>>> 
>>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>>> email to users+unsubscr...@lists.open-mpi.org 
>>>>> <mailto:users+unsubscr...@lists.open-mpi.org>.
>>>> 
>>>> 
>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>> email to users+unsubscr...@lists.open-mpi.org 
>>>> <mailto:users+unsubscr...@lists.open-mpi.org>.
>>> 
>>> 
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to users+unsubscr...@lists.open-mpi.org 
>>> <mailto:users+unsubscr...@lists.open-mpi.org>.
>> 
>> 
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to users+unsubscr...@lists.open-mpi.org 
>> <mailto:users+unsubscr...@lists.open-mpi.org>.
> 
> 
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to users+unsubscr...@lists.open-mpi.org 
> <mailto:users+unsubscr...@lists.open-mpi.org>.


-- 
{+} Jeff Squyres

To unsubscribe from this group and stop receiving emails from it, send an email 
to users+unsubscr...@lists.open-mpi.org.

Reply via email to