Bad answer, sorry I did not managed prted was part of OpenMPI stack.
Le 14/02/2025 à 19:19, Patrick Begou a écrit :
Hi Sangam
could you check that the install location of the library is the same
on all the nodes ? May be checking LD_LIBRARY_PATH after sourcing the
intel vars.sh file ?
I'm using OpenMPI 5.0.6 but in a Slurm context and it works fine.
Patrick
Le 14/02/2025 à 19:00, Sangam B a écrit :
Hi Patrick,
Thanks for your reply.
Ofcourse, the intel vars.sh is sourced inside the pbs script and I've
tried multiple ways to resolve this issue:
-x LD_LIBRARY_PATH
&
-x
LD_LIBRARY_PATH=/opt/intel/oneapi/2024/v2.1/compiler/2024.2/opt/compiler/lib:/opt/intel/oneapi/2024/v2.1/compiler/2024.2/lib:${LD_LIBRARY_PATH}
And then copied the libimf.so to job's working directory and set
-x
LD_LIBRARY_PATH=.:/opt/intel/oneapi/2024/v2.1/compiler/2024.2/opt/compiler/lib:/opt/intel/oneapi/2024/v2.1/compiler/2024.2/lib:${LD_LIBRARY_PATH}
But in any of the case it didn't work
On Fri, Feb 14, 2025 at 6:30 PM Patrick Begou
<patrick.be...@univ-grenoble-alpes.fr> wrote:
Le 14/02/2025 à 13:22, Sangam B a écrit :
> Hi,
>
> OpenMPI-5.0.6 is compiled with ucx-1.18 and Intel 1api 2024 v2.1
> compilers. An mpi program is compiled with this openmpi-5.0.6.
>
> While submitting job thru PBS on a Linux cluster, the intel
compilers
> is sourced and the same is passed thru OpenMPI's mpirun command
> option: " -x LD_LIBRARY_PATH=<lib path to intel compilers> ". But
> still the job fails with following error:
>
> prted: error while loading shared libraries: libimf.so: cannot
open
> shared object file: No such file or directory
>
> PRTE has lost communication with a remote daemon.
>
> HNP daemon : [prterun-cn19-2146925@0,0] on node cn19
> Remote daemon: [prterun-cn19-2146925@0,2] on node cn21
>
> This is usually due to either a failure of the TCP network
> connection to the node, or possibly an internal failure of
> the daemon itself. We cannot recover from this failure, and
> therefore will terminate the job.
>
> However, if put "source <path_of_intel_compiler>vars.sh" in the
> ~/.bashrc, then job works fine. But this is not the right way
to do so.
>
> But my question here is that, after passing -x LD_LIBRARY_PATH to
> mpirun command, why it is not able to find the "libimf.so" on
all the
> nodes? Is this a bug with OpenMPI-5.0.6?
>
> Thanks
> To unsubscribe from this group and stop receiving emails from
it, send
> an email to users+unsubscr...@lists.open-mpi.org
<mailto:users%2bunsubscr...@lists.open-mpi.org>.
Hi Sangam,
the "-x" option propagate your LD_LIBRARY_PATH as it is set on the
execution node. So may be you need only to set "-x LD_LIBRARY_PATH"
after sourcing your <path_of_intel_compiler>vars.sh in your PBS
script ?
Patrick (not using PBS but Slurm, sorry)
To unsubscribe from this group and stop receiving emails from it,
send an email to users+unsubscr...@lists.open-mpi.org
<mailto:users%2bunsubscr...@lists.open-mpi.org>.
To unsubscribe from this group and stop receiving emails from it,
send an email to users+unsubscr...@lists.open-mpi.org.
To unsubscribe from this group and stop receiving emails from it, send
an email to users+unsubscr...@lists.open-mpi.org.
To unsubscribe from this group and stop receiving emails from it, send an email
to users+unsubscr...@lists.open-mpi.org.