Hi Guido Your PATH and LD_LIBRARY_PATH seem to be inconsistent with each other:
PATH=$HOME/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/bin:$PATH LD_LIBRARY_PATH=/share/apps/gcc-7.3.0/lib64:$LD_LIBRARY_PATH Hence, you may be mixing different versions of Open MPI. It looks like you installer Open MPI 4.0.2 here: /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/ Have you tried this instead? LD_LIBRARY_PATH=$HOME/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib:$LD_LIBRARY_PATH I hope this helps, Gus Correa On Tue, Dec 10, 2019 at 4:40 PM Guido granda muñoz via users < users@lists.open-mpi.org> wrote: > Hello, > I compiled the application now using openmpi-4.0.2: > > linux-vdso.so.1 => (0x00007fffb23ff000) > libhdf5.so.103 => > /home/guido/libraries/compiled_with_gcc-7.3.0/hdf5-1.10.5_serial/lib/libhdf5.so.103 > (0x00002b3cd188c000) > libz.so.1 => /lib64/libz.so.1 (0x00002b3cd1e74000) > libmpi_usempif08.so.40 => > /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libmpi_usempif08.so.40 > (0x00002b3cd208a000) > libmpi_usempi_ignore_tkr.so.40 => > /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libmpi_usempi_ignore_tkr.so.40 > (0x00002b3cd22c0000) > libmpi_mpifh.so.40 => > /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libmpi_mpifh.so.40 > (0x00002b3cd24c7000) > libmpi.so.40 => > /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libmpi.so.40 > (0x00002b3cd2723000) > libgfortran.so.4 => /share/apps/gcc-7.3.0/lib64/libgfortran.so.4 > (0x00002b3cd2a55000) > libm.so.6 => /lib64/libm.so.6 (0x00002b3cd2dc3000) > libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00002b3cd3047000) > libquadmath.so.0 => /share/apps/gcc-5.4.0/lib64/libquadmath.so.0 > (0x00002b3cd325e000) > libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b3cd349c000) > libc.so.6 => /lib64/libc.so.6 (0x00002b3cd36b9000) > librt.so.1 => /lib64/librt.so.1 (0x00002b3cd3a4e000) > libdl.so.2 => /lib64/libdl.so.2 (0x00002b3cd3c56000) > libopen-rte.so.40 => > /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libopen-rte.so.40 > (0x00002b3cd3e5b000) > libopen-pal.so.40 => > /home/guido/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/lib/libopen-pal.so.40 > (0x00002b3cd4110000) > libudev.so.0 => /lib64/libudev.so.0 (0x00002b3cd4425000) > libutil.so.1 => /lib64/libutil.so.1 (0x00002b3cd4634000) > /lib64/ld-linux-x86-64.so.2 (0x00002b3cd166a000) > > and ran it like this: > > #!/bin/bash > #PBS -l nodes=1:ppn=32 > #PBS -N mc_cond_0_h3 > #PBS -o mc_cond_0_h3.o > #PBS -e mc_cond_0_h3.e > > PATH=$HOME/libraries/compiled_with_gcc-7.3.0/openmpi-4.0.2/bin:$PATH > LD_LIBRARY_PATH=/share/apps/gcc-7.3.0/lib64:$LD_LIBRARY_PATH > cd $PBS_O_WORKDIR > mpirun -np 32 ./flash4 > > and now I'm getting this error messages: > > -------------------------------------------------------------------------- > > As of version 3.0.0, the "sm" BTL is no longer available in Open MPI. > > > Efficient, high-speed same-node shared memory communication support in > > Open MPI is available in the "vader" BTL. To use the vader BTL, you > > can re-run your job with: > > > mpirun --mca btl vader,self,... your_mpi_application > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > A requested component was not found, or was unable to be opened. This > > means that this component is either not installed or is unable to be > > used on your system (e.g., sometimes this means that shared libraries > > that the component requires are unable to be found/loaded). Note that > > Open MPI stopped checking at the first component that it did not find. > > > Host: compute-0-34.local > > Framework: btl > > Component: sm > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > It looks like MPI_INIT failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during MPI_INIT; some of which are due to configuration or environment > > problems. This failure appears to be an internal failure; here's some > > additional information (which may only be relevant to an Open MPI > > developer): > > > mca_bml_base_open() failed > > --> Returned "Not found" (-13) instead of "Success" (0) > > -------------------------------------------------------------------------- > > [compute-0-34:16915] *** An error occurred in MPI_Init > > [compute-0-34:16915] *** reported by process [3776708609,5] > > [compute-0-34:16915] *** on a NULL communicator > > [compute-0-34:16915] *** Unknown error > > [compute-0-34:16915] *** MPI_ERRORS_ARE_FATAL (processes in this > communicator will now abort, > > [compute-0-34:16915] *** and potentially your MPI job) > > [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file > server/pmix_server.c at line 2147 > > [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file > server/pmix_server.c at line 2147 > > [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file > server/pmix_server.c at line 2147 > > [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file > server/pmix_server.c at line 2147 > > [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file > server/pmix_server.c at line 2147 > > [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file > server/pmix_server.c at line 2147 > > [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file > server/pmix_server.c at line 2147 > > [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file > server/pmix_server.c at line 2147 > > [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file > server/pmix_server.c at line 2147 > > [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file > server/pmix_server.c at line 2147 > > [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file > server/pmix_server.c at line 2147 > > [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file > server/pmix_server.c at line 2147 > > [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file > server/pmix_server.c at line 2147 > > [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file > server/pmix_server.c at line 2147 > > [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file > server/pmix_server.c at line 2147 > > [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file > server/pmix_server.c at line 2147 > > [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file > server/pmix_server.c at line 2147 > > [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file > server/pmix_server.c at line 2147 > > [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file > server/pmix_server.c at line 2147 > > [compute-0-34.local:16902] PMIX ERROR: UNREACHABLE in file > server/pmix_server.c at line 2147 > > [compute-0-34.local:16902] 31 more processes have sent help message > help-mpi-btl-sm.txt / btl sm is dead > > [compute-0-34.local:16902] Set MCA parameter "orte_base_help_aggregate" to > 0 to see all help / error messages > > [compute-0-34.local:16902] 31 more processes have sent help message > help-mca-base.txt / find-available:not-valid > > [compute-0-34.local:16902] 31 more processes have sent help message > help-mpi-runtime.txt / mpi_init:startup:internal-failure > > [compute-0-34.local:16902] 31 more processes have sent help message > help-mpi-errors.txt / mpi_errors_are_fatal unknown handle > > /var/spool/torque/mom_priv/jobs/4110.mouruka.crya.privado.SC: line 11: > /home/guido: is a directory > > > do you know what could cause this error? > > > Thank you, > > > > El vie., 6 dic. 2019 a las 12:13, Jeff Squyres (jsquyres) (< > jsquy...@cisco.com>) escribió: > >> On Dec 6, 2019, at 1:03 PM, Jeff Squyres (jsquyres) via users < >> users@lists.open-mpi.org> wrote: >> > >> >> I get the same error when running in a single node. I will try to use >> the last version. Is there way to check if different versions of open mpi >> were used in different nodes? >> > >> > mpirun -np 2 ompi_info | head >> > >> > Or something like that. With 1.10, I don't know/remember the mpirun >> CLI option to make one process per node (when ppn>1); you may have to check >> that. Or just "mpirun -np 33 ompi_info | head" and examine the output >> carefully to find the 33rd output and see if it's different. >> >> Poor quoting on my part. The intent was to see just the first few lines >> from running `ompi_info` on each node. >> >> So maybe something like: >> >> ------ >> $ cat foo.sh >> #!/bin/sh >> ompi_info | head >> $ mpirun -np 2 foo.sh >> ------ >> >> Or "mprun -np 33 foo.sh", ....etc. >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> >> > > -- > Guido >