William, can you run ldd /usr/lib64/libpmi.so.0.0.0 ldd /usr/lib64/libpmi2.so.0.0.0 ldd /share/sw/free/openmpi/1.6.5/intel/13sp1up1/lib/openmpi/mca_ess_pmi.so
Cheers, Gilles On Tuesday, February 2, 2016, William Law <willthe...@gmail.com> wrote: > Hi All, > > Thanks for the feedback. I guess I'm a little perplexed about how we got > here; I'd think if it was linking against the PMI stuff that slurm version > wouldn't matter? There aren't versioned PMI libraries: > /usr/lib64/libpmi.so > /usr/lib64/libpmi.so.0 > /usr/lib64/libpmi.so.0.0.0 (real file) > /usr/lib64/libpmi2.so > /usr/lib64/libpmi2.so.0 > /usr/lib64/libpmi2.so.0.0.0 (real file) > > FWIW slurm has: > /usr/lib64/libslurm.so > /usr/lib64/libslurm.so.29 (real file) > > Any easy temporary fix is just to make a symlink from libslurm.so.29 to > libslurm.so.28; things just work. Not really a long term strategy but gets > folks running again. > > Sounds like I should follow up with the slurm list. > > Regards, > > Will > > On Jan 29, 2016, at 3:59 AM, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com > <javascript:_e(%7B%7D,'cvml','gilles.gouaillar...@gmail.com');>> wrote: > > on second thought, is there any chance your sysadmin removed the old > libslurm.so.x but kept the old libpmix.so.y ? > in this case, the real issue would be hidden > your sysadmin "broke" the old libpmi, but you want to use the new one > indeed. > > Cheers, > > Gilles > > On Friday, January 29, 2016, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com > <javascript:_e(%7B%7D,'cvml','gilles.gouaillar...@gmail.com');>> wrote: > >> Is openmpi linked with a static libpmi.a that requires a dynamic libslurm >> ? >> that can be checked with ldd mca_ess_pmi.so >> >> btw, do slurm folks increase the libpmi.so version each time slurm is >> upgraded ? >> that could be a part of the issue ... >> but if they increase lib version because of abi changes, it might be a >> bad idea to open libxxx.so instead of libxxx.so.y >> generally speaking, libxxx.so.y is provided by libxxx package, and >> libxxx.so is provided by libxxx-devel package, which means it might not be >> available on compute nodes. >> we could also dlopen libxxx instead of linking with it, and have the >> sysadmin configure openmpi so it finds the right lib (this approach is used >> by a prominent vendor, and has other pros but also cons) >> >> Cheers, >> >> Gilles >> >> On Friday, January 29, 2016, Ralph Castain <r...@open-mpi.org> wrote: >> >>> It makes sense - but isn’t it slurm that is linking libpmi against >>> libslurm? I don’t think we are making that connection, so it would be a >>> slurm issue to change it. >>> >>> >>> On Jan 28, 2016, at 10:12 PM, William Law <willthe...@gmail.com> wrote: >>> >>> Hi, >>> >>> Our group can't find anyway to do this and it'd be helpful. >>> >>> We use slurm and keep upgrading the slurm environment. OpenMPI bombs >>> out against PMI each time the libslurm stuff changes, which seems to be >>> fairly regularly. Is there a way to compile against slurm but insulate >>> ourselves from the libslurm chaos? Obvious will ask the slurm folks too. >>> >>> [*wlaw*@some-node /scratch/users/wlaw/imb/src]$ mpirun -n 2 --mca >>> grpcomm ^pmi ./IMB-MPI1 >>> [some-node.local:42584] mca: base: component_find: unable to open >>> /share/sw/free/openmpi/1.6.5/intel/13sp1up1/lib/openmpi/mca_ess_pmi: >>> libslurm.so.28: cannot open shared object file: No such file or directory >>> (ignored) >>> [some-node.local:42585] mca: base: component_find: unable to open >>> /share/sw/free/openmpi/1.6.5/intel/13sp1up1/lib/openmpi/mca_pubsub_pmi: >>> libslurm.so.28: cannot open shared object file: No such file or directory >>> (ignored) >>> [some-node.local:42586] mca: base: component_find: unable to open >>> /share/sw/free/openmpi/1.6.5/intel/13sp1up1/lib/openmpi/mca_pubsub_pmi: >>> libslurm.so.28: cannot open shared object file: No such file or directory >>> (ignored) >>> >>> (sent it via the wrong email so it bounced..... heh) >>> >>> Upon further investigation it seems like the most appropriate thing >>> would be to point it at compile time to libslurm.so instead of >>> libslurm.so.xx; does that make sense? >>> >>> Thanks, >>> >>> Will >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/01/28408.php >>> >>> >>> _______________________________________________ > users mailing list > us...@open-mpi.org <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/01/28415.php > > >