Hi, We see this on our cluster as well — we traced it to because Python loads shared library extensions using RTLD_LOCAL.
The Python module (mpi4py?) has a dependency on libmpi.so, which in turn has a dependency on libhcoll.so. So the Python module is being loaded with RTLD_LOCAL, anything that it pulls in with it also ends up being loaded like that. Later, hcoll tries loading its own plugin .so files, but since libhcoll.so was loaded with RTLD_LOCAL that plugin library can’t resolve any symbols there. It might be fixable by having the hcoll plugins linked against libhcoll.so, but since it’s just a pre-built bundle from Mellanox it’s not something I can test easily. Otherwise, the solution we use is to just LD_PRELOAD=libmpi.so when launching Python so that it gets loaded into the global namespace like would happen with a “normal” compiled program. Cheers, Ben > On 8 Nov 2022, at 1:48 am, Tomislav Janjusic via devel > <de...@lists.open-mpi.org> wrote: > > Ugh - runtime command is literally in the e-mail. > > Sorry about that. > > > -- > Tomislav Janjusic > Staff Eng., Mellanox, HPC SW > +1 (512) 598-0386 > NVIDIA <http://www.nvidia.com/> > > From: Tomislav Janjusic > Sent: Monday, November 7, 2022 8:48 AM > To: 'Open MPI Developers' <de...@lists.open-mpi.org>; Open MPI Users > <users@lists.open-mpi.org> > Cc: mrlong <mrlong...@gmail.com> > Subject: RE: [OMPI devel] [LOG_CAT_ML] component basesmuma is not available > but requested in hierarchy: basesmuma, basesmuma, ucx_p2p:basesmsocket, > basesmuma, p2p > > What is the runtime command? > It’s coming from HCOLL. If HCOLL is not needed feel free to disable it -mca > coll ^hcoll > > Tomislav Janjusic > Staff Eng., Mellanox, HPC SW > +1 (512) 598-0386 > NVIDIA <http://www.nvidia.com/> > > From: devel <devel-boun...@lists.open-mpi.org > <mailto:devel-boun...@lists.open-mpi.org>> On Behalf Of mrlong via devel > Sent: Monday, November 7, 2022 2:33 AM > To: de...@lists.open-mpi.org <mailto:de...@lists.open-mpi.org>; Open MPI > Users <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>> > Cc: mrlong <mrlong...@gmail.com <mailto:mrlong...@gmail.com>> > Subject: [OMPI devel] [LOG_CAT_ML] component basesmuma is not available but > requested in hierarchy: basesmuma, basesmuma, ucx_p2p:basesmsocket, > basesmuma, p2p > > External email: Use caution opening links or attachments > > The execution of openmpi 5.0.0rc9 results in the following: > > (py3.9) [user@machine01 share]$ mpirun -n 2 python test.py > [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: > basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p > [LOG_CAT_ML] ml_discover_hierarchy exited with error > [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: > basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p > [LOG_CAT_ML] ml_discover_hierarchy exited with error > > Why is this message printed? >