:-) For the web archives: Mike confirmed to me off-list that the non-interactive login setup was, indeed, the issue, and he's now good to go.
On Apr 7, 2021, at 10:09 AM, John Hearns <hear...@gmail.com<mailto:hear...@gmail.com>> wrote: Jeff, you know as well as I do that EVERYTHING is in the path at Cornelis Networks. On Wed, 7 Apr 2021 at 14:59, Jeff Squyres (jsquyres) <jsquy...@cisco.com<mailto:jsquy...@cisco.com>> wrote: Check the output from ldd in a non-interactive login: your LD_LIBRARY_PATH probably doesn't include the location of the Intel runtime. E.g. ssh othernode ldd /path/to/orted Your shell startup files may well differentiate between interactive and non-interactive logins (i.e., it may set PATH / LD_LIBRARY_PATH / etc. differently). On Apr 7, 2021, at 7:21 AM, John Hearns via users <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> wrote: Manually log into one of your nodes. Load the modules you use in a batch job. Run 'ldd' on your executable. Start at the bottom and work upwards... By the way, have you looked at using Easybuild? Would be good to have your input there maybe. On Wed, 7 Apr 2021 at 01:01, Heinz, Michael William via users <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> wrote: I’m having a heck of a time building OMPI with Intel C. Compilation goes fine, installation goes fine, compiling test apps (the OSU benchmarks) goes fine… but when I go to actually run an MPI app I get: [awbp025:~/work/osu-icc](N/A)$ /usr/mpi/icc/openmpi-icc/bin/mpirun -np 2 -H awbp025,awbp026,awbp027,awbp028 -x FI_PROVIDER=opa1x -x LD_LIBRARY_PATH=/usr/mpi/icc/openmpi-icc/lib64:/lib hostname /usr/mpi/icc/openmpi-icc/bin/orted: error while loading shared libraries: libimf.so: cannot open shared object file: No such file or directory /usr/mpi/icc/openmpi-icc/bin/orted: error while loading shared libraries: libimf.so: cannot open shared object file: No such file or directory Looking at orted, it does seem like the binary is linking correctly: [awbp025:~/work/osu-icc](N/A)$ /usr/mpi/icc/openmpi-icc/bin/orted [awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file ess_env_module.c at line 135 [awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file util/session_dir.c at line 107 [awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file util/session_dir.c at line 346 [awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file base/ess_base_std_orted.c at line 264 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_session_dir failed --> Returned value Bad parameter (-5) instead of ORTE_SUCCESS -------------------------------------------------------------------------- and… [awbp025:~/work/osu-icc](N/A)$ ldd /usr/mpi/icc/openmpi-icc/bin/orted linux-vdso.so.1 (0x00007fffc2ebf000) libopen-rte.so.40 => /usr/mpi/icc/openmpi-icc/lib/libopen-rte.so.40 (0x00007fdaa6404000) libopen-pal.so.40 => /usr/mpi/icc/openmpi-icc/lib/libopen-pal.so.40 (0x00007fdaa60bd000) libopen-orted-mpir.so => /usr/mpi/icc/openmpi-icc/lib/libopen-orted-mpir.so (0x00007fdaa5ebb000) libm.so.6 => /lib64/libm.so.6 (0x00007fdaa5b39000) librt.so.1 => /lib64/librt.so.1 (0x00007fdaa5931000) libutil.so.1 => /lib64/libutil.so.1 (0x00007fdaa572d000) libz.so.1 => /lib64/libz.so.1 (0x00007fdaa5516000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fdaa52fe000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fdaa50de000) libc.so.6 => /lib64/libc.so.6 (0x00007fdaa4d1b000) libdl.so.2 => /lib64/libdl.so.2 (0x00007fdaa4b17000) libimf.so => /opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libimf.so (0x00007fdaa4494000) libsvml.so => /opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libsvml.so (0x00007fdaa29c4000) libirng.so => /opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libirng.so (0x00007fdaa2659000) libintlc.so.5 => /opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libintlc.so.5 (0x00007fdaa23e1000) /lib64/ld-linux-x86-64.so.2 (0x00007fdaa66d6000) Can anyone suggest what I’m forgetting to do? --- Michael Heinz Fabric Software Engineer, Cornelis Networks -- Jeff Squyres jsquy...@cisco.com<mailto:jsquy...@cisco.com> -- Jeff Squyres jsquy...@cisco.com<mailto:jsquy...@cisco.com>