There is a relevant explanation of the same issue reported for Julia: https://github.com/JuliaGPU/CUDA.jl/issues/1053
пт, 30 мая 2025 г. в 19:05, Mike Adams <mikecarlad...@gmail.com>: > Hi Tommy, > > I'm setting btl_smcuda_use_cuda_ipc_same_gpu 0 and btl_smcuda_use_cuda_ipc 0. > > So, are you saying that with these params, it is also not using GPUDirect > RDMA? > > PSC Bridges 2 only has v4 OpenMPI, but they may be working on installing > v5 now. Everything works on v5 on NCSA Delta - I'll try to test on an > older OpenMPI. > > Mike Adams > On Friday, May 30, 2025 at 10:54:23 AM UTC-6 Tomislav Janjusic US wrote: > >> Hi, >> >> I'm not sure if it's a known issue, in v4.0 possibly, not sure about v4.1 >> or v5.0 - can you try? >> As far as CUDA IPC - how are you disabling it? I don't remember the mca >> params in v4.0 >> If it's either through pml ucx, or smcuda then no, it won't use it. >> -Tommy >> >> >> On Saturday, May 24, 2025 at 8:56:50 AM UTC-7 Mike Adams wrote: >> >>> Hi, I'm using OpenMPI 4.0.5 with CUDA support on PSC Bridges-2. I'm >>> calling collectives like MPI_Allreduce on buffers that have already been >>> shared between ranks via cudaIpcGetMemHandle/cudaIpcOpenMemHandle. >>> >>> On these buffers, I receive the following message and some communication >>> sizes fail: >>> >>> >>> -------------------------------------------------------------------------- >>> The call to cuIpcGetMemHandle failed. This means the GPU RDMA protocol >>> cannot be used. >>> cuIpcGetMemHandle return value: 1 >>> address: 0x147d54000068 >>> Check the cuda.h file for what the return value means. Perhaps a reboot >>> of the node will clear the problem. >>> >>> -------------------------------------------------------------------------- >>> >>> If I pass in the two mca parameters to disable OpenMPI IPC, everything >>> works. >>> >>> I'm wondering two things: >>> Is this failure to handle IPC buffers in OpenMPI 4 a known issue? >>> When I disable OpenMPI CUDA IPC with mca parameters, does OpenMPI still >>> use GPUDirect RDMA? >>> >>> Thanks, >>> >>> Mike Adams >>> >> To unsubscribe from this group and stop receiving emails from it, send an > email to users+unsubscr...@lists.open-mpi.org. > To unsubscribe from this group and stop receiving emails from it, send an email to users+unsubscr...@lists.open-mpi.org.