Some extra information turns out if we disable rc_verbs it works on the
machines that don't work
Works
-x UCX_TLS=sm,ud
-x UCX_TLS=sm,rc_v
Fails did also found the machines that are not working are some mixed IB
card erras
Mixed ConnectX-3 & 4 fail
All ConnectX-3 works
Brock Palen
IG: brockp
We have an odd behavior after an update, the most severe one is if UCX is
allowed to use IB it will fail for anything except small messages,
Using PingPong from IMB
OMPI_MCA_pml_ucx_verbose=100 mpirun -x UCX_LOG_LEVEL=DEBUG -x
UCX_MODULE_LOG_LEVEL=DEBUG IMB-MPI1 PingPong
[lh.arc-ts.umich.edu:
Hi George and Gilles.
Thanks a lot for taking the time to test the code I sent.
As Gilles mentioned all tests he made worked perfect, I decided to
install a totally new *OMPI 4.1.0* and test again.
Happily, the OOM killer is not shooting any process and all my
experimentation worked perfect.