Re: [OMPI users] OpenMPI-5.0.5 & 5.0.6 build failure: error: expected expression before ‘struct’

Sangam B Sun, 30 Mar 2025 23:48:37 -0700

Hello Gilles,

The gromacs-2024.4 build at cmake stage shows that CUDA AWARE MPI is
detected:


-- Performing Test HAVE_MPI_EXT
-- Performing Test HAVE_MPI_EXT - Success
-- Performing Test MPI_SUPPORTS_CUDA_AWARE_DETECTION
-- Performing Test MPI_SUPPORTS_CUDA_AWARE_DETECTION - Success
-- Checking for MPI_SUPPORTS_CUDA_AWARE_DETECTION - yes

But during runtime it is not able to detect it.

The OpenMPI MCA transports used are:

 --mca btl ofi  --mca coll ^hcoll -x GMX_ENABLE_DIRECT_GPU_COMM=true -x
PATH -x LD_LIBRARY_PATH  -hostfile s_hosts2 -np ${s_nmpi} --map-by numa
--bind-to numa

This fails with following Seg-Fault error:

--------------------------------------------------------------------------
No components were able to be opened in the btl framework.

This typically means that either no components of this type were
installed, or none of the installed components can be loaded.
Sometimes this means that shared libraries required by these
components are unable to be found/loaded.

  Host:      g100n052
  Framework: btl
--------------------------------------------------------------------------
^@[g100n052:14172:0:14172] Caught signal 11 (Segmentation fault: address
not mapped to object at address (nil))
[g100n052:14176:0:14176] Caught signal 11 (Segmentation fault: address not
mapped to object at address (nil))


But 'ompi_info' shows "btl ofi" is available.

And the other notable point is, single node jobs with multiple gpus work
fine, I mean on single node gromacs detects gpu aware mpi and the
performance is good, as expected.

On more than 1 node only, it fails with the above seg fault error.

On a single node, is it using the XPMEM for communication?

Is there any OpenMPI env variable to show what transport is being used for
communication between GPUs and between MPI ranks?

Thanks

On Sun, Mar 30, 2025 at 10:21 AM Gilles Gouaillardet <
[email protected]> wrote:

> Sangan,
>
> The issue should have been fixed in Open MPI 5.0.6.
>
> Anyway, are you certain Open MPI is not GPU aware and this is not
> cmake/GROMACS that failed to detect it?
>
> What if you "configure" GROMACS with
> cmake -DGMX_FORCE_GPU_AWARE_MPI=ON ...
>
> If the problem persists, please open an issue at
> https://github.com/open-mpi/ompi/issues and do provide the required
> information.
>
> Cheers,
>
> Gilles
>
> On Sun, Mar 30, 2025 at 12:08 AM Sangam B <[email protected]> wrote:
>
>> Hi,
>>
>>        OpenMPI-5.0.5 or 5.0.6 versions fail with following error during
>> "make" stage of the build procedure:
>>
>> In file included from ../../../../../../ompi/mca/mtl/ofi/mtl_ofi.h:51,
>>                  from ../../../../../../ompi/mca/mtl/ofi/mtl_ofi.c:13:
>> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi.h: In function
>> ‘ompi_mtl_ofi_context_progress’:
>> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi_request.h:19:5: warning:
>> implicit declaration of function ‘container_of’
>> [-Wimplicit-function-declaration]
>>    19 |     container_of((_ptr_ctx), struct ompi_mtl_ofi_request_t, ctx)
>>       |     ^~~~~~~~~~~~
>> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi.h:152:27: note: in expansion
>> of macro ‘TO_OFI_REQ’
>>   152 |                 ofi_req =
>> TO_OFI_REQ(ompi_mtl_ofi_wc[i].op_context);
>>       |                           ^~~~~~~~~~
>> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi_request.h:19:30: error:
>> expected expression before ‘struct’
>>    19 |     container_of((_ptr_ctx), struct ompi_mtl_ofi_request_t, ctx)
>>       |                              ^~~~~~
>> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi.h:152:27: note: in expansion
>> of macro ‘TO_OFI_REQ’
>>   152 |                 ofi_req =
>> TO_OFI_REQ(ompi_mtl_ofi_wc[i].op_context);
>>       |                           ^~~~~~~~~~
>> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi_request.h:19:30: error:
>> expected expression before ‘struct’
>>    19 |     container_of((_ptr_ctx), struct ompi_mtl_ofi_request_t, ctx)
>>       |                              ^~~~~~
>> ../../../../../../ompi/mca/mtl/ofi/mtl_ofi.h:200:19: note: in expansion
>> of macro ‘TO_OFI_REQ’
>>   200 |         ofi_req = TO_OFI_REQ(error.op_context);
>>       |                   ^~~~~~~~~~
>> make[2]: *** [Makefile:1603: mtl_ofi.lo] Error 1
>>
>> OpenMPI-5.0.7 surpasses this error, but it is not able build cuda [GPU
>> DIRECT] & ofi support:
>>
>> Gromacs applications complains that it is not able to detect Cuda Aware
>> MPI:
>>
>> GPU-aware MPI was not detected, will not use direct GPU communication.
>> Check the GROMACS install guide for recommendations for GPU-aware support.
>> If you are certain about GPU-aware support in your MPI library, you can
>> force its use by setting the GMX_FORCE_GPU_AWARE_MPI environment variable.
>>
>> OpenMPI is configured like this:
>>
>> '--disable-opencl' '--with-slurm' '--without-lsf'
>>                           '--without-opencl'
>>
>> '--with-cuda=/opt/nvidia/hpc_sdk/Linux_x86_64/25.1/cuda/12.6'
>>                           '--without-rocm'
>>                           '--with-knem=/opt/knem-1.1.4.90mlnx3'
>>
>> '--with-xpmem=/sw/openmpi/5.0.7/g133cu126_ubu2404/xpmem/2.7.3/'
>>
>> '--with-xpmem-libdir=/sw/openmpi/5.0.7/g133cu126_ubu2404/xpmem/2.7.3//lib'
>>
>> '--with-ofi=/sw/openmpi/5.0.7/g133cu126_ubu2404/ofi/2.0.0/c126g25xu118'
>>
>> '--with-ofi-libdir=/sw/openmpi/5.0.7/g133cu126_ubu2404/ofi/2.0.0/c126g25xu118/lib'
>>                           '--enable-mca-no-build=btl-usnic'
>>
>> Can somebody help me to build a successful cuda aware openmpi here?
>>
>> Thanks
>>
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>>
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
>

To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].

Re: [OMPI users] OpenMPI-5.0.5 & 5.0.6 build failure: error: expected expression before ‘struct’

Reply via email to