Thanks Gilles & Patrick. As Gilles mentioned, while OpenMPI spawns prted daemons on compute nodes, it fails to get launched, because Intel runtime is not available.
To resolve this issue, I loaded the Intel runtime before job submission on the terminal session and used #PBS -V in the job script. Thus it got resolved. Other solutions can be: (1) If OpenMPI is built with intel compilers, then use a static build [ link the intel libs statically]. (2) Or Build Open MPI with gcc compilers [OS default] and use OMPI_CC=icc etc Thanks On Fri, Feb 14, 2025 at 11:53 PM Patrick Begou < patrick.be...@univ-grenoble-alpes.fr> wrote: > Bad answer, sorry I did not managed prted was part of OpenMPI stack. > > Le 14/02/2025 à 19:19, Patrick Begou a écrit : > > Hi Sangam > > could you check that the install location of the library is the same on > all the nodes ? May be checking LD_LIBRARY_PATH after sourcing the intel > vars.sh file ? > I'm using OpenMPI 5.0.6 but in a Slurm context and it works fine. > > Patrick > > Le 14/02/2025 à 19:00, Sangam B a écrit : > > Hi Patrick, > > Thanks for your reply. > Ofcourse, the intel vars.sh is sourced inside the pbs script and I've > tried multiple ways to resolve this issue: > > -x LD_LIBRARY_PATH > & > -x > LD_LIBRARY_PATH=/opt/intel/oneapi/2024/v2.1/compiler/2024.2/opt/compiler/lib:/opt/intel/oneapi/2024/v2.1/compiler/2024.2/lib:${LD_LIBRARY_PATH} > > And then copied the libimf.so to job's working directory and set > -x > LD_LIBRARY_PATH=.:/opt/intel/oneapi/2024/v2.1/compiler/2024.2/opt/compiler/lib:/opt/intel/oneapi/2024/v2.1/compiler/2024.2/lib:${LD_LIBRARY_PATH} > > But in any of the case it didn't work > > On Fri, Feb 14, 2025 at 6:30 PM Patrick Begou < > patrick.be...@univ-grenoble-alpes.fr> wrote: > >> Le 14/02/2025 à 13:22, Sangam B a écrit : >> > Hi, >> > >> > OpenMPI-5.0.6 is compiled with ucx-1.18 and Intel 1api 2024 v2.1 >> > compilers. An mpi program is compiled with this openmpi-5.0.6. >> > >> > While submitting job thru PBS on a Linux cluster, the intel compilers >> > is sourced and the same is passed thru OpenMPI's mpirun command >> > option: " -x LD_LIBRARY_PATH=<lib path to intel compilers> ". But >> > still the job fails with following error: >> > >> > prted: error while loading shared libraries: libimf.so: cannot open >> > shared object file: No such file or directory >> > >> > PRTE has lost communication with a remote daemon. >> > >> > HNP daemon : [prterun-cn19-2146925@0,0] on node cn19 >> > Remote daemon: [prterun-cn19-2146925@0,2] on node cn21 >> > >> > This is usually due to either a failure of the TCP network >> > connection to the node, or possibly an internal failure of >> > the daemon itself. We cannot recover from this failure, and >> > therefore will terminate the job. >> > >> > However, if put "source <path_of_intel_compiler>vars.sh" in the >> > ~/.bashrc, then job works fine. But this is not the right way to do so. >> > >> > But my question here is that, after passing -x LD_LIBRARY_PATH to >> > mpirun command, why it is not able to find the "libimf.so" on all the >> > nodes? Is this a bug with OpenMPI-5.0.6? >> > >> > Thanks >> > To unsubscribe from this group and stop receiving emails from it, send >> > an email to users+unsubscr...@lists.open-mpi.org. >> >> >> Hi Sangam, >> >> the "-x" option propagate your LD_LIBRARY_PATH as it is set on the >> execution node. So may be you need only to set "-x LD_LIBRARY_PATH" >> after sourcing your <path_of_intel_compiler>vars.sh in your PBS script ? >> >> Patrick (not using PBS but Slurm, sorry) >> >> To unsubscribe from this group and stop receiving emails from it, send an >> email to users+unsubscr...@lists.open-mpi.org. >> >> To unsubscribe from this group and stop receiving emails from it, send an > email to users+unsubscr...@lists.open-mpi.org. > > > To unsubscribe from this group and stop receiving emails from it, send an > email to users+unsubscr...@lists.open-mpi.org. > > > To unsubscribe from this group and stop receiving emails from it, send an > email to users+unsubscr...@lists.open-mpi.org. > To unsubscribe from this group and stop receiving emails from it, send an email to users+unsubscr...@lists.open-mpi.org.