Can you post the full mpirun command? or at least the relevant mpi mca
params?
" I'm still curious about your input on whether or not those mca parameters
I mentioned yesterday are disabling GPUDirect RDMA as well?"
Even if you disable sm_cuda_ipc, it's possible you're still using cuda ipc
via
mpirun --mca btl_smcuda_use_cuda_ipc_same_gpu
0 --mca btl_smcuda_use_cuda_ipc 0 --map-by ppr:2:numa --bind-to core
--rank-by slot --display-map --display-allocation --report-bindings
./multilane_ring_allreduce
where there is 1 GPU per NUMA region.
I am not sure which pml I'm using, but since th
add --mca pml_base_verbose 90
And should see something like this:
[rock18:3045236] select: component ucx selected
[rock18:3045236] select: component ob1 not selected / finalized
Or whatever your ompi instance selected.
-Tommy
On Tuesday, June 3, 2025 at 12:44:00 PM UTC-5 Mike Adams wrote:
> mpiru