Hi again, and thank you to Florent for answering my questions last time. The
answers were very helpful!
We have some strange errors occurring randomly when running MPI jobs. We are
using openmpi 4.0.3 with UCX and GPUDirect RDMA and are running multi-node
applications using SLURM on a cluster.
I'm using the infiniband drivers in the CentOS7 distribution, not the
Mellanox drivers. The version of Lustre we're using is built against the
distro drivers and breaks if the Mellanox drivers get installed.
Is there a particular version of ucx which should be used with openmpi
4.0.4? I download