The execution of openmpi 5.0.0rc9 results in the following:
(py3.9) [user@machine01 share]$ mpirun -n 2 python test.py
[LOG_CAT_ML] component basesmuma is not available but requested in
hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited
*Two machines, each with 64 cores. The contents of the hosts file are:*
192.168.180.48 slots=1
192.168.60.203 slots=1
*Why do you get the following error when running with openmpi 5.0.0rc9?*
(py3.9) [user@machine01 share]$ mpirun -n 2 --machinefile hosts hostname
--
In the future, can you please just mail one of the lists? This particular
question is probably more of a users type of question (since we're not talking
about the internals of Open MPI itself), so I'll reply just on the users list.
For what it's worth, I'm unable to replicate your error:
$ mp
Sorry for the delay in replying.
To tie up this thread for the web mail archives: this same question was
cross-posted over in the devel list; I replied there.
--
Jeff Squyres
jsquy...@cisco.com
From: users on behalf of mrlong via users
Sent: Sunday, October 30
Hi,
We see this on our cluster as well — we traced it to because Python loads
shared library extensions using RTLD_LOCAL.
The Python module (mpi4py?) has a dependency on libmpi.so, which in turn has a
dependency on libhcoll.so. So the Python module is being loaded with
RTLD_LOCAL, anything tha