William,

can you run
ldd /usr/lib64/libpmi.so.0.0.0
ldd /usr/lib64/libpmi2.so.0.0.0
ldd /share/sw/free/openmpi/1.6.5/intel/13sp1up1/lib/openmpi/mca_ess_pmi.so

Cheers,

Gilles

On Tuesday, February 2, 2016, William Law <willthe...@gmail.com> wrote:

> Hi All,
>
> Thanks for the feedback.  I guess I'm a little perplexed about how we got
> here; I'd think if it was linking against the PMI stuff that slurm version
> wouldn't matter?  There aren't versioned PMI libraries:
> /usr/lib64/libpmi.so
> /usr/lib64/libpmi.so.0
> /usr/lib64/libpmi.so.0.0.0 (real file)
> /usr/lib64/libpmi2.so
> /usr/lib64/libpmi2.so.0
> /usr/lib64/libpmi2.so.0.0.0 (real file)
>
> FWIW slurm has:
> /usr/lib64/libslurm.so
> /usr/lib64/libslurm.so.29 (real file)
>
> Any easy temporary fix is just to make a symlink from libslurm.so.29 to
> libslurm.so.28; things just work.  Not really a long term strategy but gets
> folks running again.
>
> Sounds like I should follow up with the slurm list.
>
> Regards,
>
> Will
>
> On Jan 29, 2016, at 3:59 AM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com
> <javascript:_e(%7B%7D,'cvml','gilles.gouaillar...@gmail.com');>> wrote:
>
> on second thought, is there any chance your sysadmin removed the old
> libslurm.so.x but kept the old libpmix.so.y ?
> in this case, the real issue would be hidden
> your sysadmin "broke" the old libpmi, but you want to use the new one
> indeed.
>
> Cheers,
>
> Gilles
>
> On Friday, January 29, 2016, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com
> <javascript:_e(%7B%7D,'cvml','gilles.gouaillar...@gmail.com');>> wrote:
>
>> Is openmpi linked with a static libpmi.a that requires a dynamic libslurm
>> ?
>> that can be checked with ldd mca_ess_pmi.so
>>
>> btw, do slurm folks increase the libpmi.so version each time slurm is
>> upgraded ?
>> that could be a part of the issue ...
>> but if they increase lib version because of abi changes, it might be a
>> bad idea to open libxxx.so instead of libxxx.so.y
>> generally speaking, libxxx.so.y is provided by libxxx package, and
>> libxxx.so is provided by libxxx-devel package, which means it might not be
>> available on compute nodes.
>> we could also dlopen libxxx instead of linking with it, and have the
>> sysadmin configure openmpi so it finds the right lib (this approach is used
>> by a prominent vendor, and has other pros but also cons)
>>
>> Cheers,
>>
>> Gilles
>>
>> On Friday, January 29, 2016, Ralph Castain <r...@open-mpi.org> wrote:
>>
>>> It makes sense - but isn’t it slurm that is linking libpmi against
>>> libslurm? I don’t think we are making that connection, so it would be a
>>> slurm issue to change it.
>>>
>>>
>>> On Jan 28, 2016, at 10:12 PM, William Law <willthe...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> Our group can't find anyway to do this and it'd be helpful.
>>>
>>> We use slurm and keep upgrading the slurm environment.  OpenMPI bombs
>>> out against PMI each time the libslurm stuff changes, which seems to be
>>> fairly regularly.  Is there a way to compile against slurm but insulate
>>> ourselves from the libslurm chaos?  Obvious will ask the slurm folks too.
>>>
>>> [*wlaw*@some-node /scratch/users/wlaw/imb/src]$ mpirun -n 2 --mca
>>> grpcomm ^pmi ./IMB-MPI1
>>> [some-node.local:42584] mca: base: component_find: unable to open
>>> /share/sw/free/openmpi/1.6.5/intel/13sp1up1/lib/openmpi/mca_ess_pmi:
>>> libslurm.so.28: cannot open shared object file: No such file or directory
>>> (ignored)
>>> [some-node.local:42585] mca: base: component_find: unable to open
>>> /share/sw/free/openmpi/1.6.5/intel/13sp1up1/lib/openmpi/mca_pubsub_pmi:
>>> libslurm.so.28: cannot open shared object file: No such file or directory
>>> (ignored)
>>> [some-node.local:42586] mca: base: component_find: unable to open
>>> /share/sw/free/openmpi/1.6.5/intel/13sp1up1/lib/openmpi/mca_pubsub_pmi:
>>> libslurm.so.28: cannot open shared object file: No such file or directory
>>> (ignored)
>>>
>>> (sent it via the wrong email so it bounced..... heh)
>>>
>>> Upon further investigation it seems like the most appropriate thing
>>> would be to point it at compile time to libslurm.so instead of
>>> libslurm.so.xx; does that make sense?
>>>
>>> Thanks,
>>>
>>> Will
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2016/01/28408.php
>>>
>>>
>>> _______________________________________________
> users mailing list
> us...@open-mpi.org <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/01/28415.php
>
>
>

Reply via email to