Sangam,

-x LD_LIBRARY_PATH won't do the trick here.

mpirun spawns prted daemons on the other nodes (via the tm interface or
whatever the latest PBS uses if support was built into Open MPI, or SSH
otherwise), and the daemons fail to start because the intel runtime cannot
be found.
you can chrpath -l prted in order to check if prted was built with rpath.
if so, make sure the runtime is available at the same location.


An other option is to rebuild prrte with gcc compilers so it does not
depend on the Intel runtime.

Cheers,

Gilles

On Sat, Feb 15, 2025 at 3:00 AM Sangam B <forum....@gmail.com> wrote:

> Hi Patrick,
>
> Thanks for your reply.
> Ofcourse, the intel vars.sh is sourced inside the pbs script and I've
> tried multiple ways to resolve this issue:
>
> -x LD_LIBRARY_PATH
> &
> -x
> LD_LIBRARY_PATH=/opt/intel/oneapi/2024/v2.1/compiler/2024.2/opt/compiler/lib:/opt/intel/oneapi/2024/v2.1/compiler/2024.2/lib:${LD_LIBRARY_PATH}
>
> And then copied the libimf.so to job's working directory and set
> -x
> LD_LIBRARY_PATH=.:/opt/intel/oneapi/2024/v2.1/compiler/2024.2/opt/compiler/lib:/opt/intel/oneapi/2024/v2.1/compiler/2024.2/lib:${LD_LIBRARY_PATH}
>
> But in any of the case it didn't work
>
> On Fri, Feb 14, 2025 at 6:30 PM Patrick Begou <
> patrick.be...@univ-grenoble-alpes.fr> wrote:
>
>> Le 14/02/2025 à 13:22, Sangam B a écrit :
>> > Hi,
>> >
>> > OpenMPI-5.0.6 is compiled with ucx-1.18 and Intel 1api 2024 v2.1
>> > compilers. An mpi program is compiled with this openmpi-5.0.6.
>> >
>> > While submitting job thru PBS on a Linux cluster, the intel compilers
>> > is sourced and the same is passed thru OpenMPI's mpirun command
>> > option: " -x LD_LIBRARY_PATH=<lib path to intel compilers> ". But
>> > still the job fails with following error:
>> >
>> > prted: error while loading shared libraries: libimf.so: cannot open
>> > shared object file: No such file or directory
>> >
>> > PRTE has lost communication with a remote daemon.
>> >
>> >   HNP daemon   : [prterun-cn19-2146925@0,0] on node cn19
>> >   Remote daemon: [prterun-cn19-2146925@0,2] on node cn21
>> >
>> > This is usually due to either a failure of the TCP network
>> > connection to the node, or possibly an internal failure of
>> > the daemon itself. We cannot recover from this failure, and
>> > therefore will terminate the job.
>> >
>> > However, if put "source <path_of_intel_compiler>vars.sh" in the
>> > ~/.bashrc, then job works fine. But this is not the right way to do so.
>> >
>> > But my question here is that, after passing -x LD_LIBRARY_PATH to
>> > mpirun command, why it is not able to find the "libimf.so" on all the
>> > nodes? Is this a bug with OpenMPI-5.0.6?
>> >
>> > Thanks
>> > To unsubscribe from this group and stop receiving emails from it, send
>> > an email to users+unsubscr...@lists.open-mpi.org.
>>
>>
>> Hi Sangam,
>>
>> the "-x" option propagate your LD_LIBRARY_PATH as it is set on the
>> execution node. So may be you need only to set "-x LD_LIBRARY_PATH"
>> after sourcing your <path_of_intel_compiler>vars.sh in your PBS script ?
>>
>> Patrick (not using PBS but Slurm, sorry)
>>
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to users+unsubscr...@lists.open-mpi.org.
>>
>> To unsubscribe from this group and stop receiving emails from it, send an
> email to users+unsubscr...@lists.open-mpi.org.
>

To unsubscribe from this group and stop receiving emails from it, send an email 
to users+unsubscr...@lists.open-mpi.org.

Reply via email to