Jeff, you know as well as I do that EVERYTHING is in the path at Cornelis
Networks.

On Wed, 7 Apr 2021 at 14:59, Jeff Squyres (jsquyres) <jsquy...@cisco.com>
wrote:

> Check the output from ldd in a non-interactive login: your LD_LIBRARY_PATH
> probably doesn't include the location of the Intel runtime.
>
> E.g.
>
>     ssh othernode ldd /path/to/orted
>
> Your shell startup files may well differentiate between interactive and
> non-interactive logins (i.e., it may set PATH / LD_LIBRARY_PATH / etc.
> differently).
>
>
> On Apr 7, 2021, at 7:21 AM, John Hearns via users <
> users@lists.open-mpi.org> wrote:
>
> Manually log into one of your nodes. Load the modules you use in a batch
> job. Run 'ldd' on your executable.
> Start at the bottom and work upwards...
>
> By the way, have you looked at using Easybuild? Would be good to have your
> input there maybe.
>
>
> On Wed, 7 Apr 2021 at 01:01, Heinz, Michael William via users <
> users@lists.open-mpi.org> wrote:
>
>> I’m having a heck of a time building OMPI with Intel C. Compilation goes
>> fine, installation goes fine, compiling test apps (the OSU benchmarks) goes
>> fine…
>>
>>
>>
>> but when I go to actually run an MPI app I get:
>>
>>
>>
>> [awbp025:~/work/osu-icc](N/A)$ /usr/mpi/icc/openmpi-icc/bin/mpirun -np 2
>> -H awbp025,awbp026,awbp027,awbp028 -x FI_PROVIDER=opa1x -x
>> LD_LIBRARY_PATH=/usr/mpi/icc/openmpi-icc/lib64:/lib hostname
>>
>> /usr/mpi/icc/openmpi-icc/bin/orted: error while loading shared libraries:
>> libimf.so: cannot open shared object file: No such file or directory
>>
>> /usr/mpi/icc/openmpi-icc/bin/orted: error while loading shared libraries:
>> libimf.so: cannot open shared object file: No such file or directory
>>
>>
>>
>> Looking at orted, it does seem like the binary is linking correctly:
>>
>>
>>
>> [awbp025:~/work/osu-icc](N/A)$ /usr/mpi/icc/openmpi-icc/bin/orted
>>
>> [awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
>> ess_env_module.c at line 135
>>
>> [awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in
>> file util/session_dir.c at line 107
>>
>> [awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in
>> file util/session_dir.c at line 346
>>
>> [awbp025:620372] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in
>> file base/ess_base_std_orted.c at line 264
>>
>> --------------------------------------------------------------------------
>>
>> It looks like orte_init failed for some reason; your parallel process is
>>
>> likely to abort.  There are many reasons that a parallel process can
>>
>> fail during orte_init; some of which are due to configuration or
>>
>> environment problems.  This failure appears to be an internal failure;
>>
>> here's some additional information (which may only be relevant to an
>>
>> Open MPI developer):
>>
>>
>>
>>   orte_session_dir failed
>>
>>   --> Returned value Bad parameter (-5) instead of ORTE_SUCCESS
>>
>> --------------------------------------------------------------------------
>>
>>
>>
>> and…
>>
>>
>>
>> [awbp025:~/work/osu-icc](N/A)$ ldd /usr/mpi/icc/openmpi-icc/bin/orted
>>
>>         linux-vdso.so.1 (0x00007fffc2ebf000)
>>
>>         libopen-rte.so.40 =>
>> /usr/mpi/icc/openmpi-icc/lib/libopen-rte.so.40 (0x00007fdaa6404000)
>>
>>         libopen-pal.so.40 =>
>> /usr/mpi/icc/openmpi-icc/lib/libopen-pal.so.40 (0x00007fdaa60bd000)
>>
>>         libopen-orted-mpir.so =>
>> /usr/mpi/icc/openmpi-icc/lib/libopen-orted-mpir.so (0x00007fdaa5ebb000)
>>
>>         libm.so.6 => /lib64/libm.so.6 (0x00007fdaa5b39000)
>>
>>         librt.so.1 => /lib64/librt.so.1 (0x00007fdaa5931000)
>>
>>         libutil.so.1 => /lib64/libutil.so.1 (0x00007fdaa572d000)
>>
>>         libz.so.1 => /lib64/libz.so.1 (0x00007fdaa5516000)
>>
>>         libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fdaa52fe000)
>>
>>         libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fdaa50de000)
>>
>>         libc.so.6 => /lib64/libc.so.6 (0x00007fdaa4d1b000)
>>
>>         libdl.so.2 => /lib64/libdl.so.2 (0x00007fdaa4b17000)
>>
>>         libimf.so =>
>> /opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libimf.so
>> (0x00007fdaa4494000)
>>
>>         libsvml.so =>
>> /opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libsvml.so
>> (0x00007fdaa29c4000)
>>
>>         libirng.so =>
>> /opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libirng.so
>> (0x00007fdaa2659000)
>>
>>         libintlc.so.5 =>
>> /opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin/libintlc.so.5
>> (0x00007fdaa23e1000)
>>
>>         /lib64/ld-linux-x86-64.so.2 (0x00007fdaa66d6000)
>>
>>
>>
>> Can anyone suggest what I’m forgetting to do?
>>
>>
>>
>> ---
>>
>> Michael Heinz
>> Fabric Software Engineer, Cornelis Networks
>>
>>
>>
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
>
>

Reply via email to