Le 14/02/2025 à 13:22, Sangam B a écrit :
Hi,

OpenMPI-5.0.6 is compiled with ucx-1.18 and Intel 1api 2024 v2.1 compilers. An mpi program is compiled with this openmpi-5.0.6.

While submitting job thru PBS on a Linux cluster, the intel compilers is sourced and the same is passed thru OpenMPI's mpirun command option: " -x LD_LIBRARY_PATH=<lib path to intel compilers> ". But still the job fails with following error:

prted: error while loading shared libraries: libimf.so: cannot open shared object file: No such file or directory

PRTE has lost communication with a remote daemon.

  HNP daemon   : [prterun-cn19-2146925@0,0] on node cn19
  Remote daemon: [prterun-cn19-2146925@0,2] on node cn21

This is usually due to either a failure of the TCP network
connection to the node, or possibly an internal failure of
the daemon itself. We cannot recover from this failure, and
therefore will terminate the job.

However, if put "source <path_of_intel_compiler>vars.sh" in the ~/.bashrc, then job works fine. But this is not the right way to do so.

But my question here is that, after passing -x LD_LIBRARY_PATH to mpirun command, why it is not able to find the "libimf.so" on all the nodes? Is this a bug with OpenMPI-5.0.6?

Thanks
To unsubscribe from this group and stop receiving emails from it, send an email to users+unsubscr...@lists.open-mpi.org.


Hi Sangam,

the "-x" option propagate your LD_LIBRARY_PATH as it is set on the execution node. So may be you need only to set "-x LD_LIBRARY_PATH" after sourcing your <path_of_intel_compiler>vars.sh in your PBS script ?

Patrick (not using PBS but Slurm, sorry)

To unsubscribe from this group and stop receiving emails from it, send an email 
to users+unsubscr...@lists.open-mpi.org.

Reply via email to