Manually log into one of your nodes. Load the modules you use in a batch job. Run 'ldd' on your executable. Start at the bottom and work upwards...
By the way, have you looked at using Easybuild? Would be good to have your input there maybe. On Wed, 7 Apr 2021 at 01:01, Heinz, Michael William via users < users@lists.open-mpi.org> wrote: > I’m having a heck of a time building OMPI with Intel C. Compilation goes > fine, installation goes fine, compiling test apps (the OSU benchmarks) goes > fine… > > > > but when I go to actually run an MPI app I get: > > > > [awbp025:~/work/osu-icc](N/A)$ /usr/mpi/icc/openmpi-icc/bin/mpirun -np 2 > -H awbp025,awbp026,awbp027,awbp028 -x FI_PROVIDER=opa1x -x > LD_LIBRARY_PATH=/usr/mpi/icc/openmpi-icc/lib64:/lib hostname > > /usr/mpi/icc/openmpi-icc/bin/orted: error while loading shared libraries: > libimf.so: cannot open shared object file: No such file or directory > > /usr/mpi/icc/openmpi-icc/bin/orted: error while loading shared libraries: > libimf.so: cannot open shared object file: No such file or directory > > > > Looking at orted, it does seem like the binary is linking correctly: > > > > [awbp025:~/work/osu-icc](N/A)$ /usr/mpi/icc/openmpi-icc/bin/orted > > [awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file > ess_env_module.c at line 135 > > [awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file > util/session_dir.c at line 107 > > [awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file > util/session_dir.c at line 346 > > [awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file > base/ess_base_std_orted.c at line 264 > > -------------------------------------------------------------------------- > > It looks like orte_init failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during orte_init; some of which are due to configuration or > > environment problems. This failure appears to be an internal failure; > > here's some additional information (which may only be relevant to an > > Open MPI developer): > > > > orte_session_dir failed > > --> Returned value Bad parameter (-5) instead of ORTE_SUCCESS > > -------------------------------------------------------------------------- > > > > and… > > > > [awbp025:~/work/osu-icc](N/A)$ ldd /usr/mpi/icc/openmpi-icc/bin/orted > > linux-vdso.so.1 (0x00007fffc2ebf000) > > libopen-rte.so.40 => > /usr/mpi/icc/openmpi-icc/lib/libopen-rte.so.40 (0x00007fdaa6404000) > > libopen-pal.so.40 => > /usr/mpi/icc/openmpi-icc/lib/libopen-pal.so.40 (0x00007fdaa60bd000) > > libopen-orted-mpir.so => > /usr/mpi/icc/openmpi-icc/lib/libopen-orted-mpir.so (0x00007fdaa5ebb000) > > libm.so.6 => /lib64/libm.so.6 (0x00007fdaa5b39000) > > librt.so.1 => /lib64/librt.so.1 (0x00007fdaa5931000) > > libutil.so.1 => /lib64/libutil.so.1 (0x00007fdaa572d000) > > libz.so.1 => /lib64/libz.so.1 (0x00007fdaa5516000) > > libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fdaa52fe000) > > libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fdaa50de000) > > libc.so.6 => /lib64/libc.so.6 (0x00007fdaa4d1b000) > > libdl.so.2 => /lib64/libdl.so.2 (0x00007fdaa4b17000) > > libimf.so => > /opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libimf.so > (0x00007fdaa4494000) > > libsvml.so => > /opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libsvml.so > (0x00007fdaa29c4000) > > libirng.so => > /opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libirng.so > (0x00007fdaa2659000) > > libintlc.so.5 => > /opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libintlc.so.5 > (0x00007fdaa23e1000) > > /lib64/ld-linux-x86-64.so.2 (0x00007fdaa66d6000) > > > > Can anyone suggest what I’m forgetting to do? > > > > --- > > Michael Heinz > Fabric Software Engineer, Cornelis Networks > > >