Hello Ompi Users, UCX version: https://github.com/openucx/ucx/releases/download/v1.16.0 OpenMPI version: 5.0.5 OpenMPI is installed with Ucx, Pmix, Libevent & hwloc.
The job which is run on 4 nodes with 192 ranks per node fails with following UCX error: ucp_context.c:1112 UCX ERROR Failed to query resources: Out of memory Any reason why this is failing? Any suggestions on -mca pml parameters to fix this error? Any recommendations for a particular version of UCX? Thanks