There is a relevant explanation of the same issue reported for Julia:
https://github.com/JuliaGPU/CUDA.jl/issues/1053

пт, 30 мая 2025 г. в 19:05, Mike Adams <mikecarlad...@gmail.com>:

> Hi Tommy,
>
> I'm setting btl_smcuda_use_cuda_ipc_same_gpu 0 and btl_smcuda_use_cuda_ipc 0.
>
> So, are you saying that with these params, it is also not using GPUDirect
> RDMA?
>
> PSC Bridges 2 only has v4 OpenMPI, but they may be working on installing
> v5 now.  Everything works on v5 on NCSA Delta - I'll try to test on an
> older OpenMPI.
>
> Mike Adams
> On Friday, May 30, 2025 at 10:54:23 AM UTC-6 Tomislav Janjusic US wrote:
>
>> Hi,
>>
>> I'm not sure if it's a known issue, in v4.0 possibly, not sure about v4.1
>> or v5.0 - can you try?
>> As far as CUDA IPC - how are you disabling it? I don't remember the mca
>> params in v4.0
>> If it's either through pml ucx, or smcuda then no, it won't use it.
>> -Tommy
>>
>>
>> On Saturday, May 24, 2025 at 8:56:50 AM UTC-7 Mike Adams wrote:
>>
>>> Hi, I'm using OpenMPI 4.0.5 with CUDA support on PSC Bridges-2.  I'm
>>> calling collectives like MPI_Allreduce on buffers that have already been
>>> shared between ranks via cudaIpcGetMemHandle/cudaIpcOpenMemHandle.
>>>
>>> On these buffers, I receive the following message and some communication
>>> sizes fail:
>>>
>>>
>>> --------------------------------------------------------------------------
>>> The call to cuIpcGetMemHandle failed. This means the GPU RDMA protocol
>>> cannot be used.
>>>   cuIpcGetMemHandle return value:   1
>>>   address: 0x147d54000068
>>> Check the cuda.h file for what the return value means. Perhaps a reboot
>>> of the node will clear the problem.
>>>
>>> --------------------------------------------------------------------------
>>>
>>> If I pass in the two mca parameters to disable OpenMPI IPC, everything
>>> works.
>>>
>>> I'm wondering two things:
>>> Is this failure to handle IPC buffers in OpenMPI 4 a known issue?
>>> When I disable OpenMPI CUDA IPC with mca parameters, does OpenMPI still
>>> use GPUDirect RDMA?
>>>
>>> Thanks,
>>>
>>> Mike Adams
>>>
>> To unsubscribe from this group and stop receiving emails from it, send an
> email to users+unsubscr...@lists.open-mpi.org.
>

To unsubscribe from this group and stop receiving emails from it, send an email 
to users+unsubscr...@lists.open-mpi.org.

Reply via email to