Hi all, I am observing some strange behaviour with a dynamically linked binary inside an sbatch job. This binary is, among others, compiled against the MPICH library - so when I do an „ldd“ I get
$ ldd /path/to/binary linux-vdso.so.1 => (0x00007ffd817c5000) libdl.so.2 => /lib64/libdl.so.2 (0x00002ae4a3152000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00002ae4a3356000) libmpi.so.12 => not found libmpifort.so.12 => not found libm.so.6 => /lib64/libm.so.6 (0x00002ae4a3572000) librt.so.1 => /lib64/librt.so.1 (0x00002ae4a3874000) libc.so.6 => /lib64/libc.so.6 (0x00002ae4a3a7c000) /lib64/ld-linux-x86-64.so.2 (0x00002ae4a2f2e000) showing me that it cannot find those shared objects as I have not loaded any modules into my environment, yet. (This is expected). Now, if I allocate some resources and start an interactive slurm session via e.g. $ srun -N 1 -c 4 -t 10:00 --pty bash and load the appropriate module (LMOD btw.) into my environment, e.g. $ module load GCC/10.3.0 $ module load MPICH/3.4.2 and then again check the linked libraries, I get $ ldd /path/to/binary linux-vdso.so.1 => (0x00007fffe3d2c000) libdl.so.2 => /lib64/libdl.so.2 (0x00002b4f58b6d000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b4f58d71000) libmpi.so.12 => /Applic.HPC/Easybuild/skylake/2021a/software/MPICH/3.4.2-GCC-10.3.0/lib/libmpi.so.12 (0x00002b4f58f8d000) libmpifort.so.12 => /Applic.HPC/Easybuild/skylake/2021a/software/MPICH/3.4.2-GCC-10.3.0/lib/libmpifort.so.12 (0x00002b4f58977000) libm.so.6 => /lib64/libm.so.6 (0x00002b4f59ee4000) librt.so.1 => /lib64/librt.so.1 (0x00002b4f5a1e6000) libc.so.6 => /lib64/libc.so.6 (0x00002b4f5a3ee000) /lib64/ld-linux-x86-64.so.2 (0x00002b4f58949000) Now finding the correct paths to the libraries. HOWEVER, I cannot reproduce this inside an sbatch job I submitted. When it checks for the shared libs via ldd, the paths to the MPI libraries are not found. The job script looks more or less like his #################################################### #!/bin/bash #SBATCH --partition admin #SBATCH --nodes=1 #SBATCH --cpus-per-task=4 #SBATCH --time=10:00 module load GCC/10.3.0 module load MPICH/3.4.2 ldd /path/to/binary #################################################### So nothing too complicated. I tested this with other, self-compiled, binaries which all seem to work just fine. Unfortunately this is a closed source binary blob - so I cannot recompile. One interesting thing is, when I do not load any environment modules, but just directly set the LD_LIBRARY_PATH variable to the correct path before calling ldd, i.e. LD_LIBRARY_PATH=/Applic.HPC/Easybuild/skylake/2021a/software/MPICH/3.4.2-GCC-10.3.0/lib ldd /path/to/binary it will work as intended - also in batch job. Can anyone make sense of this? Can there be something hard coded into the binary, preventing it from using an exported LD_LIBRARY_PATH? And why would it work interactively, but not in a batch job? Many thanks Sebastian
smime.p7s
Description: S/MIME cryptographic signature