Is openmpi linked with a static libpmi.a that requires a dynamic libslurm ?
that can be checked with ldd mca_ess_pmi.so

btw, do slurm folks increase the libpmi.so version each time slurm is
upgraded ?
that could be a part of the issue ...
but if they increase lib version because of abi changes, it might be a bad
idea to open libxxx.so instead of libxxx.so.y
generally speaking, libxxx.so.y is provided by libxxx package, and
libxxx.so is provided by libxxx-devel package, which means it might not be
available on compute nodes.
we could also dlopen libxxx instead of linking with it, and have the
sysadmin configure openmpi so it finds the right lib (this approach is used
by a prominent vendor, and has other pros but also cons)

Cheers,

Gilles

On Friday, January 29, 2016, Ralph Castain <r...@open-mpi.org> wrote:

> It makes sense - but isn’t it slurm that is linking libpmi against
> libslurm? I don’t think we are making that connection, so it would be a
> slurm issue to change it.
>
>
> On Jan 28, 2016, at 10:12 PM, William Law <willthe...@gmail.com
> <javascript:_e(%7B%7D,'cvml','willthe...@gmail.com');>> wrote:
>
> Hi,
>
> Our group can't find anyway to do this and it'd be helpful.
>
> We use slurm and keep upgrading the slurm environment.  OpenMPI bombs out
> against PMI each time the libslurm stuff changes, which seems to be fairly
> regularly.  Is there a way to compile against slurm but insulate ourselves
> from the libslurm chaos?  Obvious will ask the slurm folks too.
>
> [*wlaw*@some-node /scratch/users/wlaw/imb/src]$ mpirun -n 2 --mca grpcomm
> ^pmi ./IMB-MPI1
> [some-node.local:42584] mca: base: component_find: unable to open
> /share/sw/free/openmpi/1.6.5/intel/13sp1up1/lib/openmpi/mca_ess_pmi:
> libslurm.so.28: cannot open shared object file: No such file or directory
> (ignored)
> [some-node.local:42585] mca: base: component_find: unable to open
> /share/sw/free/openmpi/1.6.5/intel/13sp1up1/lib/openmpi/mca_pubsub_pmi:
> libslurm.so.28: cannot open shared object file: No such file or directory
> (ignored)
> [some-node.local:42586] mca: base: component_find: unable to open
> /share/sw/free/openmpi/1.6.5/intel/13sp1up1/lib/openmpi/mca_pubsub_pmi:
> libslurm.so.28: cannot open shared object file: No such file or directory
> (ignored)
>
> (sent it via the wrong email so it bounced..... heh)
>
> Upon further investigation it seems like the most appropriate thing would
> be to point it at compile time to libslurm.so instead of libslurm.so.xx;
> does that make sense?
>
> Thanks,
>
> Will
> _______________________________________________
> users mailing list
> us...@open-mpi.org <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/01/28408.php
>
>
>

Reply via email to