Le 14/02/2025 à 13:22, Sangam B a écrit :
Hi,
OpenMPI-5.0.6 is compiled with ucx-1.18 and Intel 1api 2024 v2.1
compilers. An mpi program is compiled with this openmpi-5.0.6.
While submitting job thru PBS on a Linux cluster, the intel compilers
is sourced and the same is passed thru OpenMPI's mpirun command
option: " -x LD_LIBRARY_PATH=<lib path to intel compilers> ". But
still the job fails with following error:
prted: error while loading shared libraries: libimf.so: cannot open
shared object file: No such file or directory
PRTE has lost communication with a remote daemon.
HNP daemon : [prterun-cn19-2146925@0,0] on node cn19
Remote daemon: [prterun-cn19-2146925@0,2] on node cn21
This is usually due to either a failure of the TCP network
connection to the node, or possibly an internal failure of
the daemon itself. We cannot recover from this failure, and
therefore will terminate the job.
However, if put "source <path_of_intel_compiler>vars.sh" in the
~/.bashrc, then job works fine. But this is not the right way to do so.
But my question here is that, after passing -x LD_LIBRARY_PATH to
mpirun command, why it is not able to find the "libimf.so" on all the
nodes? Is this a bug with OpenMPI-5.0.6?
Thanks
To unsubscribe from this group and stop receiving emails from it, send
an email to users+unsubscr...@lists.open-mpi.org.
Hi Sangam,
the "-x" option propagate your LD_LIBRARY_PATH as it is set on the
execution node. So may be you need only to set "-x LD_LIBRARY_PATH"
after sourcing your <path_of_intel_compiler>vars.sh in your PBS script ?
Patrick (not using PBS but Slurm, sorry)
To unsubscribe from this group and stop receiving emails from it, send an email
to users+unsubscr...@lists.open-mpi.org.