Thanks Gilles & Patrick.

As Gilles mentioned, while OpenMPI spawns prted daemons on compute nodes,
it fails to get launched, because Intel runtime is not available.
To resolve this issue, I loaded the Intel runtime before job submission on
the terminal session and used #PBS -V in the job script.
Thus it got resolved.

Other solutions can be:
(1) If OpenMPI is built with intel compilers, then use a static build [
link the intel libs statically].
(2) Or Build Open MPI with gcc compilers [OS default] and use OMPI_CC=icc
etc

Thanks



On Fri, Feb 14, 2025 at 11:53 PM Patrick Begou <
patrick.be...@univ-grenoble-alpes.fr> wrote:

> Bad answer, sorry I did not managed prted was part of OpenMPI stack.
>
> Le 14/02/2025 à 19:19, Patrick Begou a écrit :
>
> Hi Sangam
>
> could you check that the install location of the library is the same on
> all the nodes ?  May be checking LD_LIBRARY_PATH after sourcing the intel
> vars.sh file ?
> I'm using OpenMPI 5.0.6 but in a Slurm context and it works fine.
>
> Patrick
>
> Le 14/02/2025 à 19:00, Sangam B a écrit :
>
> Hi Patrick,
>
> Thanks for your reply.
> Ofcourse, the intel vars.sh is sourced inside the pbs script and I've
> tried multiple ways to resolve this issue:
>
> -x LD_LIBRARY_PATH
> &
> -x
> LD_LIBRARY_PATH=/opt/intel/oneapi/2024/v2.1/compiler/2024.2/opt/compiler/lib:/opt/intel/oneapi/2024/v2.1/compiler/2024.2/lib:${LD_LIBRARY_PATH}
>
> And then copied the libimf.so to job's working directory and set
> -x
> LD_LIBRARY_PATH=.:/opt/intel/oneapi/2024/v2.1/compiler/2024.2/opt/compiler/lib:/opt/intel/oneapi/2024/v2.1/compiler/2024.2/lib:${LD_LIBRARY_PATH}
>
> But in any of the case it didn't work
>
> On Fri, Feb 14, 2025 at 6:30 PM Patrick Begou <
> patrick.be...@univ-grenoble-alpes.fr> wrote:
>
>> Le 14/02/2025 à 13:22, Sangam B a écrit :
>> > Hi,
>> >
>> > OpenMPI-5.0.6 is compiled with ucx-1.18 and Intel 1api 2024 v2.1
>> > compilers. An mpi program is compiled with this openmpi-5.0.6.
>> >
>> > While submitting job thru PBS on a Linux cluster, the intel compilers
>> > is sourced and the same is passed thru OpenMPI's mpirun command
>> > option: " -x LD_LIBRARY_PATH=<lib path to intel compilers> ". But
>> > still the job fails with following error:
>> >
>> > prted: error while loading shared libraries: libimf.so: cannot open
>> > shared object file: No such file or directory
>> >
>> > PRTE has lost communication with a remote daemon.
>> >
>> >   HNP daemon   : [prterun-cn19-2146925@0,0] on node cn19
>> >   Remote daemon: [prterun-cn19-2146925@0,2] on node cn21
>> >
>> > This is usually due to either a failure of the TCP network
>> > connection to the node, or possibly an internal failure of
>> > the daemon itself. We cannot recover from this failure, and
>> > therefore will terminate the job.
>> >
>> > However, if put "source <path_of_intel_compiler>vars.sh" in the
>> > ~/.bashrc, then job works fine. But this is not the right way to do so.
>> >
>> > But my question here is that, after passing -x LD_LIBRARY_PATH to
>> > mpirun command, why it is not able to find the "libimf.so" on all the
>> > nodes? Is this a bug with OpenMPI-5.0.6?
>> >
>> > Thanks
>> > To unsubscribe from this group and stop receiving emails from it, send
>> > an email to users+unsubscr...@lists.open-mpi.org.
>>
>>
>> Hi Sangam,
>>
>> the "-x" option propagate your LD_LIBRARY_PATH as it is set on the
>> execution node. So may be you need only to set "-x LD_LIBRARY_PATH"
>> after sourcing your <path_of_intel_compiler>vars.sh in your PBS script ?
>>
>> Patrick (not using PBS but Slurm, sorry)
>>
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to users+unsubscr...@lists.open-mpi.org.
>>
>> To unsubscribe from this group and stop receiving emails from it, send an
> email to users+unsubscr...@lists.open-mpi.org.
>
>
> To unsubscribe from this group and stop receiving emails from it, send an
> email to users+unsubscr...@lists.open-mpi.org.
>
>
> To unsubscribe from this group and stop receiving emails from it, send an
> email to users+unsubscr...@lists.open-mpi.org.
>

To unsubscribe from this group and stop receiving emails from it, send an email 
to users+unsubscr...@lists.open-mpi.org.

Reply via email to