Hi All, Thanks for the feedback. I guess I'm a little perplexed about how we got here; I'd think if it was linking against the PMI stuff that slurm version wouldn't matter? There aren't versioned PMI libraries: /usr/lib64/libpmi.so /usr/lib64/libpmi.so.0 /usr/lib64/libpmi.so.0.0.0 (real file) /usr/lib64/libpmi2.so /usr/lib64/libpmi2.so.0 /usr/lib64/libpmi2.so.0.0.0 (real file)
FWIW slurm has: /usr/lib64/libslurm.so /usr/lib64/libslurm.so.29 (real file) Any easy temporary fix is just to make a symlink from libslurm.so.29 to libslurm.so.28; things just work. Not really a long term strategy but gets folks running again. Sounds like I should follow up with the slurm list. Regards, Will > On Jan 29, 2016, at 3:59 AM, Gilles Gouaillardet > <gilles.gouaillar...@gmail.com> wrote: > > on second thought, is there any chance your sysadmin removed the old > libslurm.so.x but kept the old libpmix.so.y ? > in this case, the real issue would be hidden > your sysadmin "broke" the old libpmi, but you want to use the new one indeed. > > Cheers, > > Gilles > > On Friday, January 29, 2016, Gilles Gouaillardet > <gilles.gouaillar...@gmail.com <mailto:gilles.gouaillar...@gmail.com>> wrote: > Is openmpi linked with a static libpmi.a that requires a dynamic libslurm ? > that can be checked with ldd mca_ess_pmi.so > > btw, do slurm folks increase the libpmi.so version each time slurm is > upgraded ? > that could be a part of the issue ... > but if they increase lib version because of abi changes, it might be a bad > idea to open libxxx.so instead of libxxx.so.y > generally speaking, libxxx.so.y is provided by libxxx package, and libxxx.so > is provided by libxxx-devel package, which means it might not be available on > compute nodes. > we could also dlopen libxxx instead of linking with it, and have the sysadmin > configure openmpi so it finds the right lib (this approach is used by a > prominent vendor, and has other pros but also cons) > > Cheers, > > Gilles > > On Friday, January 29, 2016, Ralph Castain <r...@open-mpi.org > <javascript:_e(%7B%7D,'cvml','r...@open-mpi.org');>> wrote: > It makes sense - but isn’t it slurm that is linking libpmi against libslurm? > I don’t think we are making that connection, so it would be a slurm issue to > change it. > > >> On Jan 28, 2016, at 10:12 PM, William Law <willthe...@gmail.com <>> wrote: >> >> Hi, >> >> Our group can't find anyway to do this and it'd be helpful. >> >> We use slurm and keep upgrading the slurm environment. OpenMPI bombs out >> against PMI each time the libslurm stuff changes, which seems to be fairly >> regularly. Is there a way to compile against slurm but insulate ourselves >> from the libslurm chaos? Obvious will ask the slurm folks too. >> >> [wlaw@some-node /scratch/users/wlaw/imb/src]$ mpirun -n 2 --mca grpcomm ^pmi >> ./IMB-MPI1 >> [some-node.local:42584] mca: base: component_find: unable to open >> /share/sw/free/openmpi/1.6.5/intel/13sp1up1/lib/openmpi/mca_ess_pmi: >> libslurm.so.28: cannot open shared object file: No such file or directory >> (ignored) >> [some-node.local:42585] mca: base: component_find: unable to open >> /share/sw/free/openmpi/1.6.5/intel/13sp1up1/lib/openmpi/mca_pubsub_pmi: >> libslurm.so.28: cannot open shared object file: No such file or directory >> (ignored) >> [some-node.local:42586] mca: base: component_find: unable to open >> /share/sw/free/openmpi/1.6.5/intel/13sp1up1/lib/openmpi/mca_pubsub_pmi: >> libslurm.so.28: cannot open shared object file: No such file or directory >> (ignored) >> >> (sent it via the wrong email so it bounced..... heh) >> >> Upon further investigation it seems like the most appropriate thing would be >> to point it at compile time to libslurm.so instead of libslurm.so.xx; does >> that make sense? >> >> Thanks, >> >> Will >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/01/28408.php >> <http://www.open-mpi.org/community/lists/users/2016/01/28408.php> > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/01/28415.php