Re: [OMPI users] Is ready send mode supported? Or can I force eager protocol for a subset of messages?

2020-10-27 Thread Oskar Lappi via users
uning - Since we're using UCX, that is the framework that handles which protocol to choose, and I've asked the wrong mailing list :) /Oskar On 10/22/20 1:44 PM, Oskar Lappi via users wrote: Hi, a performance question, I have a distributed stencil loop that's sending several tens

[OMPI users] Is ready send mode supported? Or can I force eager protocol for a subset of messages?

2020-10-22 Thread Oskar Lappi via users
Hi, a performance question, I have a distributed stencil loop that's sending several tens of slightly larger messages every iteration, I post double buffered receives at initialization and immediately after a receive request is completed. I can therefore prove that the receive is posted on the

[OMPI users] MPI_Init sometimes fails when using UCX/GPUDirect RDMA

2020-07-21 Thread Oskar Lappi via users
Hi again, and thank you to Florent for answering my questions last time. The answers were very helpful! We have some strange errors occurring randomly when running MPI jobs. We are using openmpi 4.0.3 with UCX and GPUDirect RDMA and are running multi-node applications using SLURM on a cluster.

[OMPI users] openib BTL vs UCX. Which do I need to use GPUDirect RDMA?

2020-06-30 Thread Oskar Lappi via users
Hi,  I'm trying to troubleshoot a problem, we don't seem to be getting the bandwidth we'd expect from our distributed CUDA program, where we're using Open MPI to pass data between GPUs in a HPC cluster. I thought I found a possible root cause, but now I'm unsure of how to fix this, since t