[OMPI users] [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma, basesmuma, ucx_p2p:basesmsocket, basesmuma, p2p

2022-11-07 Thread mrlong via users
The execution of openmpi 5.0.0rc9 results in the following: (py3.9) [user@machine01 share]$  mpirun -n 2 python test.py [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited

[OMPI users] There are not enough slots available in the system to satisfy the 2, slots that were requested by the application

2022-11-07 Thread mrlong via users
*Two machines, each with 64 cores. The contents of the hosts file are:* 192.168.180.48 slots=1 192.168.60.203 slots=1 *Why do you get the following error when running with openmpi 5.0.0rc9?* (py3.9) [user@machine01 share]$  mpirun -n 2 --machinefile hosts hostname --

Re: [OMPI users] [OMPI devel] There are not enough slots available in the system to satisfy the 2, slots that were requested by the application

2022-11-07 Thread Jeff Squyres (jsquyres) via users
In the future, can you please just mail one of the lists? This particular question is probably more of a users type of question (since we're not talking about the internals of Open MPI itself), so I'll reply just on the users list. For what it's worth, I'm unable to replicate your error: $ mp

Re: [OMPI users] --mca btl_base_verbose 30 not working in version 5.0

2022-11-07 Thread Jeff Squyres (jsquyres) via users
Sorry for the delay in replying. To tie up this thread for the web mail archives: this same question was cross-posted over in the devel list; I replied there. -- Jeff Squyres jsquy...@cisco.com From: users on behalf of mrlong via users Sent: Sunday, October 30

Re: [OMPI users] [OMPI devel] [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma, basesmuma, ucx_p2p:basesmsocket, basesmuma, p2p

2022-11-07 Thread Ben Menadue via users
Hi, We see this on our cluster as well — we traced it to because Python loads shared library extensions using RTLD_LOCAL. The Python module (mpi4py?) has a dependency on libmpi.so, which in turn has a dependency on libhcoll.so. So the Python module is being loaded with RTLD_LOCAL, anything tha