uning
-
Since we're using UCX, that is the framework that handles which protocol
to choose, and I've asked the wrong mailing list :)
/Oskar
On 10/22/20 1:44 PM, Oskar Lappi via users wrote:
Hi, a performance question,
I have a distributed stencil loop that's sending several tens
Hi, a performance question,
I have a distributed stencil loop that's sending several tens of
slightly larger messages every iteration, I post double buffered
receives at initialization and immediately after a receive request is
completed.
I can therefore prove that the receive is posted on the
Hi again, and thank you to Florent for answering my questions last time. The
answers were very helpful!
We have some strange errors occurring randomly when running MPI jobs. We are
using openmpi 4.0.3 with UCX and GPUDirect RDMA and are running multi-node
applications using SLURM on a cluster.
Hi,
I'm trying to troubleshoot a problem, we don't seem to be getting the
bandwidth we'd expect from our distributed CUDA program, where we're
using Open MPI to pass data between GPUs in a HPC cluster.
I thought I found a possible root cause, but now I'm unsure of how to
fix this, since t