Hello Gilles,

No it is not working on single node only:

mpirun  --mca pml ob1 --mca btl ofi   -np 4 test_ompi507stub.exe
--------------------------------------------------------------------------
No components were able to be opened in the btl framework.

This typically means that either no components of this type were
installed, or none of the installed components can be loaded.
Sometimes this means that shared libraries required by these
components are unable to be found/loaded.

  Host:      g100n052
  Framework: btl
--------------------------------------------------------------------------
--------------------------------------------------------------------------
MPI_INIT has failed because at least one MPI process is unreachable
from another.  This *usually* means that an underlying communication
plugin -- such as a BTL or an MTL -- has either not loaded or not
allowed itself to be used.  Your MPI job will now abort.

You may wish to try to narrow down the problem;

 * Check the output of ompi_info to see which BTL/MTL plugins are
   available.
 * Run your application with MPI_THREAD_SINGLE.
 * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
   if using MTL-based communications) to see exactly which
   communication plugins were considered and/or discarded.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_mpi_instance_init failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
[g100n052:00000] *** An error occurred in MPI_Init
[g100n052:00000] *** reported by process [901316609,0]
[g100n052:00000] *** on a NULL communicator
[g100n052:00000] *** Unknown error
[g100n052:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator
will now abort,
[g100n052:00000] ***    and MPI will try to terminate your MPI job as well)
--------------------------------------------------------------------------
prterun has exited due to process rank 0 with PID 0 on node g100n052 calling
"abort". This may have caused other processes in the application to be
terminated by signals sent by prterun (as reported here).
--------------------------------------------------------------------------

ompi_info shows:

        MCA accelerator: null (MCA v2.1.0, API v1.0.0, Component v5.0.7)
         MCA accelerator: cuda (MCA v2.1.0, API v1.0.0, Component v5.0.7)
           MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component v5.0.7)
           MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component v5.0.7)
           MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component
v5.0.7)
                 MCA btl: self (MCA v2.1.0, API v3.3.0, Component v5.0.7)
                 MCA btl: ofi (MCA v2.1.0, API v3.3.0, Component v5.0.7)
                 MCA btl: sm (MCA v2.1.0, API v3.3.0, Component v5.0.7)
                 MCA btl: tcp (MCA v2.1.0, API v3.3.0, Component v5.0.7)
                 MCA btl: uct (MCA v2.1.0, API v3.3.0, Component v5.0.7)
                 MCA btl: smcuda (MCA v2.1.0, API v3.3.0, Component v5.0.7)
                  MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component v5.0.7)
                  MCA if: linux_ipv6 (MCA v2.1.0, API v2.0.0, Component
                          v5.0.7)
                  MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component
                          v5.0.7)
         MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v5.0.7)
         MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component v5.0.7)
              MCA memory: patcher (MCA v2.1.0, API v2.0.0, Component v5.0.7)
               MCA mpool: hugepage (MCA v2.1.0, API v3.1.0, Component
v5.0.7)
             MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component
                          v5.0.7)
              MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component v5.0.7)
              MCA rcache: gpusm (MCA v2.1.0, API v3.3.0, Component v5.0.7)
              MCA rcache: rgpusm (MCA v2.1.0, API v3.3.0, Component v5.0.7)
           MCA reachable: weighted (MCA v2.1.0, API v2.0.0, Component
v5.0.7)
           MCA reachable: netlink (MCA v2.1.0, API v2.0.0, Component v5.0.7)
               MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v5.0.7)
               MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component v5.0.7)
               MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v5.0.7)
                MCA smsc: cma (MCA v2.1.0, API v1.0.0, Component v5.0.7)
                MCA smsc: knem (MCA v2.1.0, API v1.0.0, Component v5.0.7)
                MCA smsc: xpmem (MCA v2.1.0, API v1.0.0, Component v5.0.7)
             MCA threads: pthreads (MCA v2.1.0, API v1.0.0, Component
v5.0.7)
               MCA timer: linux (MCA v2.1.0, API v2.0.0, Component v5.0.7)
                 MCA bml: r2 (MCA v2.1.0, API v2.1.0, Component v5.0.7)
                MCA coll: adapt (MCA v2.1.0, API v2.4.0, Component v5.0.7)
                MCA coll: basic (MCA v2.1.0, API v2.4.0, Component v5.0.7)
                MCA coll: han (MCA v2.1.0, API v2.4.0, Component v5.0.7)
                MCA coll: inter (MCA v2.1.0, API v2.4.0, Component v5.0.7)
                MCA coll: libnbc (MCA v2.1.0, API v2.4.0, Component v5.0.7)
                MCA coll: self (MCA v2.1.0, API v2.4.0, Component v5.0.7)
                MCA coll: sync (MCA v2.1.0, API v2.4.0, Component v5.0.7)
                MCA coll: tuned (MCA v2.1.0, API v2.4.0, Component v5.0.7)
                MCA coll: cuda (MCA v2.1.0, API v2.4.0, Component v5.0.7)
                MCA coll: ftagree (MCA v2.1.0, API v2.4.0, Component v5.0.7)
                MCA coll: hcoll (MCA v2.1.0, API v2.4.0, Component v5.0.7)
                MCA coll: monitoring (MCA v2.1.0, API v2.4.0, Component
                          v5.0.7)
                MCA coll: sm (MCA v2.1.0, API v2.4.0, Component v5.0.7)
                MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v5.0.7)
               MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component v5.0.7)
               MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0, Component
                          v5.0.7)
               MCA fcoll: individual (MCA v2.1.0, API v2.0.0, Component
                          v5.0.7)
               MCA fcoll: vulcan (MCA v2.1.0, API v2.0.0, Component v5.0.7)
                  MCA fs: lustre (MCA v2.1.0, API v2.0.0, Component v5.0.7)
                  MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v5.0.7)
                MCA hook: comm_method (MCA v2.1.0, API v1.0.0, Component
                          v5.0.7)
                  MCA io: ompio (MCA v2.1.0, API v2.0.0, Component v5.0.7)
                  MCA io: romio341 (MCA v2.1.0, API v2.0.0, Component
v5.0.7)
                 MCA mtl: ofi (MCA v2.1.0, API v2.0.0, Component v5.0.7)
                  MCA op: avx (MCA v2.1.0, API v1.0.0, Component v5.0.7)
                 MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v5.0.7)
                 MCA osc: monitoring (MCA v2.1.0, API v3.0.0, Component
                          v5.0.7)
                 MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v5.0.7)
                 MCA osc: ucx (MCA v2.1.0, API v3.0.0, Component v5.0.7)
                MCA part: persist (MCA v2.1.0, API v4.0.0, Component v5.0.7)
                 MCA pml: cm (MCA v2.1.0, API v2.1.0, Component v5.0.7)
                 MCA pml: monitoring (MCA v2.1.0, API v2.1.0, Component
                          v5.0.7)
                 MCA pml: ob1 (MCA v2.1.0, API v2.1.0, Component v5.0.7)
                 MCA pml: ucx (MCA v2.1.0, API v2.1.0, Component v5.0.7)
                 MCA pml: v (MCA v2.1.0, API v2.1.0, Component v5.0.7)
            MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component
                          v5.0.7)
            MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component
                          v5.0.7)
            MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v5.0.7)
                MCA topo: basic (MCA v2.1.0, API v2.2.0, Component v5.0.7)
                MCA topo: treematch (MCA v2.1.0, API v2.2.0, Component
                          v5.0.7)
           MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component
                          v5.0.7)

OpenMPI is configured like this:

  Configure command line:
'--prefix=/sw/openmpi/5.0.7/g133cu126stubU2404/xp_minu118ofi2'
                          '--without-lsf'

'--with-cuda=/opt/nvidia/hpc_sdk/Linux_x86_64/25.1/cuda/12.6'

'--with-cuda-libdir=/opt/nvidia/hpc_sdk/Linux_x86_64/25.1/cuda/12.6/targets/x86_64-linux/lib/stubs'
                          '--with-knem=/opt/knem-1.1.4.90mlnx3'

'--with-xpmem=/sw/openmpi/5.0.7/g133cu126stubU2404/xpmem/2.7.3/'

'--with-xpmem-libdir=/sw/openmpi/5.0.7/g133cu126stubU2404/xpmem/2.7.3//lib'

'--with-ofi=/sw/openmpi/5.0.7/g133cu126stubU2404/ofi/2.0.0/c126g25xu118'

'--with-ofi-libdir=/sw/openmpi/5.0.7/g133cu126stubU2404/ofi/2.0.0/c126g25xu118/lib'
                          '--enable-mca-no-build=btl-usnic'

UCX-1.18.0 is configured like this:

../../configure --prefix=${s_pfix} \
        --enable-mt \
        --without-rocm \
        --with-cuda=${cuda_path} \
        --with-knem=${knem_path} \
        --with-xpmem=${xpmem_path}

OFI [libfabric-2.0.0] is configured:

./configure \
        --prefix=${s_pfix} \
        --enable-shm=dl \
        --enable-sockets=dl \
        --enable-udp=dl \
        --enable-tcp=dl \
        --enable-rxm=dl \
        --enable-rxd=dl \
        --enable-verbs=dl \
        --enable-psm2=no \
        --enable-psm3=no \
        --enable-ucx=dl:${ucx_path} \
        --enable-gdrcopy-dlopen --with-gdrcopy=/usr \
       --enable-cuda-dlopen --with-cuda=${cuda_path} \
        --enable-xpmem=${xpmem_path} > config.out

With some verbose settings, It is not able to initialize btl component ofi:

[g100n052:24169] mca: base: components_register: component ofi register
function successful
[g100n052:24169] mca: base: components_open: opening btl components
[g100n052:24169] mca: base: components_open: found loaded component ofi
[g100n052:24169] mca: base: components_open: component ofi open function
successful
[g100n052:24172] select: initializing btl component ofi
[g100n052:24171] select: initializing btl component ofi
[g100n052:24170] select: initializing btl component ofi
[g100n052:24169] select: initializing btl component ofi
[g100n052:24172] select: init of component ofi returned failure
[g100n052:24171] select: init of component ofi returned failure
[g100n052:24169] select: init of component ofi returned failure
[g100n052:24170] select: init of component ofi returned failure
[g100n052:24172] mca: base: close: component ofi closed



Thanks



On Mon, Mar 31, 2025 at 12:25 PM Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> Sangam,
>
> What if you run a simple MPI hello world program with
> mpirun --mca pml ob1 --mca btl ofi ...
>
> on one and several nodes?
>
> Cheers,
>
> Gilles
>
> On Mon, Mar 31, 2025 at 3:48 PM Sangam B <forum....@gmail.com> wrote:
>
>> Hello Gilles,
>>
>> The gromacs-2024.4 build at cmake stage shows that CUDA AWARE MPI is
>> detected:
>>
>> -- Performing Test HAVE_MPI_EXT
>> -- Performing Test HAVE_MPI_EXT - Success
>> -- Performing Test MPI_SUPPORTS_CUDA_AWARE_DETECTION
>> -- Performing Test MPI_SUPPORTS_CUDA_AWARE_DETECTION - Success
>> -- Checking for MPI_SUPPORTS_CUDA_AWARE_DETECTION - yes
>>
>> But during runtime it is not able to detect it.
>>
>> The OpenMPI MCA transports used are:
>>
>>  --mca btl ofi  --mca coll ^hcoll -x GMX_ENABLE_DIRECT_GPU_COMM=true -x
>> PATH -x LD_LIBRARY_PATH  -hostfile s_hosts2 -np ${s_nmpi} --map-by numa
>> --bind-to numa
>>
>> This fails with following Seg-Fault error:
>>
>> --------------------------------------------------------------------------
>> No components were able to be opened in the btl framework.
>>
>> This typically means that either no components of this type were
>> installed, or none of the installed components can be loaded.
>> Sometimes this means that shared libraries required by these
>> components are unable to be found/loaded.
>>
>>   Host:      g100n052
>>   Framework: btl
>> --------------------------------------------------------------------------
>> ^@[g100n052:14172:0:14172] Caught signal 11 (Segmentation fault: address
>> not mapped to object at address (nil))
>> [g100n052:14176:0:14176] Caught signal 11 (Segmentation fault: address
>> not mapped to object at address (nil))
>>
>>
>> But 'ompi_info' shows "btl ofi" is available.
>>
>> And the other notable point is, single node jobs with multiple gpus work
>> fine, I mean on single node gromacs detects gpu aware mpi and the
>> performance is good, as expected.
>>
>> On more than 1 node only, it fails with the above seg fault error.
>>
>> On a single node, is it using the XPMEM for communication?
>>
>> Is there any OpenMPI env variable to show what transport is being used
>> for communication between GPUs and between MPI ranks?
>>
>> Thanks
>>
>> On Sun, Mar 30, 2025 at 10:21 AM Gilles Gouaillardet <
>> gilles.gouaillar...@gmail.com> wrote:
>>
>>> Sangan,
>>>
>>> The issue should have been fixed in Open MPI 5.0.6.
>>>
>>> Anyway, are you certain Open MPI is not GPU aware and this is not
>>> cmake/GROMACS that failed to detect it?
>>>
>>> What if you "configure" GROMACS with
>>> cmake -DGMX_FORCE_GPU_AWARE_MPI=ON ...
>>>
>>> If the problem persists, please open an issue at
>>> https://github.com/open-mpi/ompi/issues and do provide the required
>>> information.
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> On Sun, Mar 30, 2025 at 12:08 AM Sangam B <forum....@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>>        OpenMPI-5.0.5 or 5.0.6 versions fail with following error during
>>>> "make" stage of the build procedure:
>>>>
>>>> In file included from ../../../../../../ompi/mca/mtl/ofi/mtl_ofi.h:51,
>>>>                  from ../../../../../../ompi/mca/mtl/ofi/mtl_ofi.c:13:
>>>> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi.h: In function
>>>> ‘ompi_mtl_ofi_context_progress’:
>>>> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi_request.h:19:5: warning:
>>>> implicit declaration of function ‘container_of’
>>>> [-Wimplicit-function-declaration]
>>>>    19 |     container_of((_ptr_ctx), struct ompi_mtl_ofi_request_t, ctx)
>>>>       |     ^~~~~~~~~~~~
>>>> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi.h:152:27: note: in expansion
>>>> of macro ‘TO_OFI_REQ’
>>>>   152 |                 ofi_req =
>>>> TO_OFI_REQ(ompi_mtl_ofi_wc[i].op_context);
>>>>       |                           ^~~~~~~~~~
>>>> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi_request.h:19:30: error:
>>>> expected expression before ‘struct’
>>>>    19 |     container_of((_ptr_ctx), struct ompi_mtl_ofi_request_t, ctx)
>>>>       |                              ^~~~~~
>>>> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi.h:152:27: note: in expansion
>>>> of macro ‘TO_OFI_REQ’
>>>>   152 |                 ofi_req =
>>>> TO_OFI_REQ(ompi_mtl_ofi_wc[i].op_context);
>>>>       |                           ^~~~~~~~~~
>>>> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi_request.h:19:30: error:
>>>> expected expression before ‘struct’
>>>>    19 |     container_of((_ptr_ctx), struct ompi_mtl_ofi_request_t, ctx)
>>>>       |                              ^~~~~~
>>>> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi.h:200:19: note: in expansion
>>>> of macro ‘TO_OFI_REQ’
>>>>   200 |         ofi_req = TO_OFI_REQ(error.op_context);
>>>>       |                   ^~~~~~~~~~
>>>> make[2]: *** [Makefile:1603: mtl_ofi.lo] Error 1
>>>>
>>>> OpenMPI-5.0.7 surpasses this error, but it is not able build cuda [GPU
>>>> DIRECT] & ofi support:
>>>>
>>>> Gromacs applications complains that it is not able to detect Cuda Aware
>>>> MPI:
>>>>
>>>> GPU-aware MPI was not detected, will not use direct GPU communication.
>>>> Check the GROMACS install guide for recommendations for GPU-aware support.
>>>> If you are certain about GPU-aware support in your MPI library, you can
>>>> force its use by setting the GMX_FORCE_GPU_AWARE_MPI environment variable.
>>>>
>>>> OpenMPI is configured like this:
>>>>
>>>> '--disable-opencl' '--with-slurm' '--without-lsf'
>>>>                           '--without-opencl'
>>>>
>>>> '--with-cuda=/opt/nvidia/hpc_sdk/Linux_x86_64/25.1/cuda/12.6'
>>>>                           '--without-rocm'
>>>>                           '--with-knem=/opt/knem-1.1.4.90mlnx3'
>>>>
>>>> '--with-xpmem=/sw/openmpi/5.0.7/g133cu126_ubu2404/xpmem/2.7.3/'
>>>>
>>>> '--with-xpmem-libdir=/sw/openmpi/5.0.7/g133cu126_ubu2404/xpmem/2.7.3//lib'
>>>>
>>>> '--with-ofi=/sw/openmpi/5.0.7/g133cu126_ubu2404/ofi/2.0.0/c126g25xu118'
>>>>
>>>> '--with-ofi-libdir=/sw/openmpi/5.0.7/g133cu126_ubu2404/ofi/2.0.0/c126g25xu118/lib'
>>>>                           '--enable-mca-no-build=btl-usnic'
>>>>
>>>> Can somebody help me to build a successful cuda aware openmpi here?
>>>>
>>>> Thanks
>>>>
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to users+unsubscr...@lists.open-mpi.org.
>>>>
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to users+unsubscr...@lists.open-mpi.org.
>>>
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to users+unsubscr...@lists.open-mpi.org.
>>
> To unsubscribe from this group and stop receiving emails from it, send an
> email to users+unsubscr...@lists.open-mpi.org.
>

To unsubscribe from this group and stop receiving emails from it, send an email 
to users+unsubscr...@lists.open-mpi.org.

Reply via email to