Hello Ompi Users,

           UCX version:
https://github.com/openucx/ucx/releases/download/v1.16.0
           OpenMPI version: 5.0.5
OpenMPI is installed with Ucx, Pmix, Libevent & hwloc.

The job which is run on 4 nodes with 192 ranks per node fails with
following UCX error:

ucp_context.c:1112 UCX  ERROR Failed to query resources: Out of memory

Any reason why this is failing?

Any suggestions on -mca pml parameters to fix this error?

Any recommendations for a particular version of UCX?

Thanks

Reply via email to