Hi Sangam

could you check that the install location of the library is the same on all the nodes ?  May be checking LD_LIBRARY_PATH after sourcing the intel vars.sh file ?
I'm using OpenMPI 5.0.6 but in a Slurm context and it works fine.

Patrick

Le 14/02/2025 à 19:00, Sangam B a écrit :
Hi Patrick,

Thanks for your reply.
Ofcourse, the intel vars.sh is sourced inside the pbs script and I've tried multiple ways to resolve this issue:

-x LD_LIBRARY_PATH
&
-x LD_LIBRARY_PATH=/opt/intel/oneapi/2024/v2.1/compiler/2024.2/opt/compiler/lib:/opt/intel/oneapi/2024/v2.1/compiler/2024.2/lib:${LD_LIBRARY_PATH}

And then copied the libimf.so to job's working directory and set
-x LD_LIBRARY_PATH=.:/opt/intel/oneapi/2024/v2.1/compiler/2024.2/opt/compiler/lib:/opt/intel/oneapi/2024/v2.1/compiler/2024.2/lib:${LD_LIBRARY_PATH}

But in any of the case it didn't work

On Fri, Feb 14, 2025 at 6:30 PM Patrick Begou <patrick.be...@univ-grenoble-alpes.fr> wrote:

    Le 14/02/2025 à 13:22, Sangam B a écrit :
    > Hi,
    >
    > OpenMPI-5.0.6 is compiled with ucx-1.18 and Intel 1api 2024 v2.1
    > compilers. An mpi program is compiled with this openmpi-5.0.6.
    >
    > While submitting job thru PBS on a Linux cluster, the intel
    compilers
    > is sourced and the same is passed thru OpenMPI's mpirun command
    > option: " -x LD_LIBRARY_PATH=<lib path to intel compilers> ". But
    > still the job fails with following error:
    >
    > prted: error while loading shared libraries: libimf.so: cannot open
    > shared object file: No such file or directory
    >
    > PRTE has lost communication with a remote daemon.
    >
    >   HNP daemon   : [prterun-cn19-2146925@0,0] on node cn19
    >   Remote daemon: [prterun-cn19-2146925@0,2] on node cn21
    >
    > This is usually due to either a failure of the TCP network
    > connection to the node, or possibly an internal failure of
    > the daemon itself. We cannot recover from this failure, and
    > therefore will terminate the job.
    >
    > However, if put "source <path_of_intel_compiler>vars.sh" in the
    > ~/.bashrc, then job works fine. But this is not the right way to
    do so.
    >
    > But my question here is that, after passing -x LD_LIBRARY_PATH to
    > mpirun command, why it is not able to find the "libimf.so" on
    all the
    > nodes? Is this a bug with OpenMPI-5.0.6?
    >
    > Thanks
    > To unsubscribe from this group and stop receiving emails from
    it, send
    > an email to users+unsubscr...@lists.open-mpi.org
    <mailto:users%2bunsubscr...@lists.open-mpi.org>.


    Hi Sangam,

    the "-x" option propagate your LD_LIBRARY_PATH as it is set on the
    execution node. So may be you need only to set "-x LD_LIBRARY_PATH"
    after sourcing your <path_of_intel_compiler>vars.sh in your PBS
    script ?

    Patrick (not using PBS but Slurm, sorry)

    To unsubscribe from this group and stop receiving emails from it,
    send an email to users+unsubscr...@lists.open-mpi.org
    <mailto:users%2bunsubscr...@lists.open-mpi.org>.

To unsubscribe from this group and stop receiving emails from it, send an email to users+unsubscr...@lists.open-mpi.org.

To unsubscribe from this group and stop receiving emails from it, send an email 
to users+unsubscr...@lists.open-mpi.org.

Reply via email to