Try adding --without-psm2 to the PMIx configure line - sounds like you have that library installed on your machine, even though you don't have omnipath.
On May 12, 2020, at 4:42 AM, Leandro via users <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > wrote: HI, I compile it statically to make sure compilers libraries will not be a dependency, and I do this way for years. The developers said they want this way, so I did. I saw this warning and this library is related with omnipath, which we don't have. --- Leandro On Tue, May 12, 2020 at 8:27 AM Jeff Squyres (jsquyres) <jsquy...@cisco.com <mailto:jsquy...@cisco.com> > wrote: It looks like you are building both static and dynamic libraries (--enable-static and --enable-shared). This might be confusing the issue -- I can see at least one warning: icc: warning #10237: -lcilkrts linked in dynamically, static library not available It's not easy to tell from the snippets you sent what other downstream side effects this might have. Is there a reason to compile statically? It generally leads to (much) bigger executables, and far less memory efficiency (i.e., the library is not shared in memory between all the MPI processes running on each node). Also, the link phase of compilers tends to prefer shared libraries, so unless your apps are compiled/linked with whatever the compiler's "link this statically" flags are, it's going to likely default to using the shared libraries. This is a long way of saying: try building everything with just --enable-shared (and not --enable-static). Or possibly just remove both flags; --enable-shared is the default. On May 11, 2020, at 9:23 AM, Leandro via users <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > wrote: Hi, I'm trying to start using Slurm, and I followed all the instructions ti build PMIx, Slurm using pmix, but I can't make openmpi to work. According to PMIx documentation, I should compile openmpi using "--with-ompi-pmix-rte" but when I tried, It fails. I need to build this as CentOS rpms. Thanks in advance for your help. I pasted some info below. libtool: link: /tgdesenv/dist/compiladores/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/icc -std=gnu99 -std=gnu99 -DOPAL_CONFIGURE_USER=\"root\" -DOPAL_CONFIGURE_HOST=\"gr10b17n05\" "-DOPAL_CONFIGURE_DATE=\"Fri May 8 13:35:51 -03 2020\"" -DOMPI_BUILD_USER=\"root\" -DOMPI_BUILD_HOST=\"gr10b17n05\" "-DOMPI_BUILD_DATE=\"Fri May 8 13:47:32 -03 2020\"" "-DOMPI_BUILD_CFLAGS=\"-DNDEBUG -O3 -finline-functions -fno-strict-aliasing -restrict -Qoption,cpp,--extended_float_types -pthread\"" "-DOMPI_BUILD_CPPFLAGS=\"-I../../.. -I../../../orte/include \"" "-DOMPI_BUILD_CXXFLAGS=\"-DNDEBUG -O3 -finline-functions -pthread\"" "-DOMPI_BUILD_CXXCPPFLAGS=\"-I../../.. \"" "-DOMPI_BUILD_FFLAGS=\"-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -I/usr/lib64/gfortran/modules\"" -DOMPI_BUILD_FCFLAGS=\"-O3\" "-DOMPI_BUILD_LDFLAGS=\"-Wc,-static-intel -static-intel -L/usr/lib64\"" "-DOMPI_BUILD_LIBS=\"-lrt -lutil -lz -lhwloc -levent -levent_pthreads\"" -DOPAL_CC_ABSOLUTE=\"\" -DOMPI_CXX_ABSOLUTE=\"none\" -DNDEBUG -O3 -finline-functions -fno-strict-aliasing -restrict -Qoption,cpp,--extended_float_types -pthread -static-intel -static-intel -o .libs/ompi_info ompi_info.o param.o -L/usr/lib64 ../../../ompi/.libs/libmpi.so -L/usr/lib -llustreapi /root/rpmbuild/BUILD/openmpi-4.0.2/opal/.libs/libopen-pal.so ../../../opal/.libs/libopen-pal.so -lfabric -lucp -lucm -lucs -luct -lrdmacm -libverbs /usr/lib64/libpmix.so -lmunge -lrt -lutil -lz /usr/lib64/libhwloc.so -lm -ludev -lltdl -levent -levent_pthreads -pthread -Wl,-rpath -Wl,/usr/lib64 icc: warning #10237: -lcilkrts linked in dynamically, static library not available ../../../ompi/.libs/libmpi.so: undefined reference to `orte_process_info' ../../../ompi/.libs/libmpi.so: undefined reference to `orte_show_help' make[2]: *** [ompi_info] Error 1 make[2]: Leaving directory `/root/rpmbuild/BUILD/openmpi-4.0.2/ompi/tools/ompi_info' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/root/rpmbuild/BUILD/openmpi-4.0.2/ompi' make: *** [all-recursive] Error 1 error: Bad exit status from /var/tmp/rpm-tmp.RyklCR (%build) The orte libraries are missing. When I don't use "-with-ompi-pmix-rte" it builds, but neither mpirun or srun works: c315@gr10b17n05 /bw1nfs1/Projetos1/c315/Meus_testes > cat machine_file gr10b17n05 gr10b17n06 gr10b17n07 gr10b17n08 c315@gr10b17n05 /bw1nfs1/Projetos1/c315/Meus_testes > mpirun -machinefile machine_file ./mpihello [gr10b17n07:115065] [[21391,0],2] ORTE_ERROR_LOG: Not found in file base/ess_base_std_orted.c at line 362 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_pmix_base_select failed --> Returned value Not found (-13) instead of ORTE_SUCCESS -------------------------------------------------------------------------- -------------------------------------------------------------------------- ORTE was unable to reliably start one or more daemons. This usually is caused by: * not finding the required libraries and/or binaries on one or more nodes. Please check your PATH and LD_LIBRARY_PATH settings, or configure OMPI with --enable-orterun-prefix-by-default * lack of authority to execute on one or more specified nodes. Please verify your allocation and authorities. * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base). Please check with your sys admin to determine the correct location to use. * compilation of the orted with dynamic libraries when static are required (e.g., on Cray). Please check your configure cmd line and consider using one of the contrib/platform definitions for your system type. * an inability to create a connection back to mpirun due to a lack of common network interfaces and/or no route found between them. Please check network connectivity (including firewalls and network routing requirements). -------------------------------------------------------------------------- [gr10b17n08:142030] [[21391,0],3] ORTE_ERROR_LOG: Not found in file base/ess_base_std_orted.c at line 362 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_pmix_base_select failed --> Returned value Not found (-13) instead of ORTE_SUCCESS -------------------------------------------------------------------------- -------------------------------------------------------------------------- ORTE does not know how to route a message to the specified daemon located on the indicated node: my node: gr10b17n05 target node: gr10b17n06 This is usually an internal programming error that should be reported to the developers. In the meantime, a workaround may be to set the MCA param routed=direct on the command line or in your environment. We apologize for the problem. -------------------------------------------------------------------------- [gr10b17n05:171586] 1 more process has sent help message help-errmgr-base.txt / no-path [gr10b17n05:171586] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages c315@gr10b17n05 /bw1nfs1/Projetos1/c315/Meus_testes > -------------------------- c315@gr10pbs2 /bw1nfs1/Projetos1/c315/Meus_testes > mpirun --nolocal -np 1 --machinefile machine_file mpihello [gr10pbs2:242828] [[60566,0],0] ORTE_ERROR_LOG: Not found in file ess_hnp_module.c at line 320 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_pmix_base_select failed --> Returned value Not found (-13) instead of ORTE_SUCCESS -------------------------------------------------------------------------- c315@gr10pbs2 /bw1nfs1/Projetos1/c315/Meus_testes > mpirun --nolocal -np 1 --machinefile machine_file mpihello [gr10pbs2:237314] [[50968,0],0] ORTE_ERROR_LOG: Not found in file ess_hnp_module.c at line 320 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_pmix_base_select failed --> Returned value Not found (-13) instead of ORTE_SUCCESS -------------------------------------------------------------------------- c315@gr10pbs2 /bw1nfs1/Projetos1/c315/Meus_testes > c315@gr10pbs2 /bw1nfs1/Projetos1/c315/Meus_testes > srun -N4 /bw1nfs1/Projetos1/c315/Meus_testes/mpihello *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, *** and potentially your MPI job) [gr10b17n05:172693] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed! srun: error: gr10b17n05: task 0: Exited with exit code 1 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): getting job size failed --> Returned value Not found (-13) instead of ORTE_SUCCESS -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): getting job size failed --> Returned value Not found (-13) instead of ORTE_SUCCESS -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_ess_init failed --> Returned value Not found (-13) instead of ORTE_SUCCESS -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_ess_init failed --> Returned value Not found (-13) instead of ORTE_SUCCESS -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_mpi_init: ompi_rte_init failed --> Returned "Not found" (-13) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, *** and potentially your MPI job) -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_mpi_init: ompi_rte_init failed --> Returned "Not found" (-13) instead of "Success" (0) -------------------------------------------------------------------------- [gr10b17n07:116175] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed! *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, *** and potentially your MPI job) [gr10b17n06:142082] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed! -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): getting job size failed --> Returned value Not found (-13) instead of ORTE_SUCCESS -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_ess_init failed --> Returned value Not found (-13) instead of ORTE_SUCCESS -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_mpi_init: ompi_rte_init failed --> Returned "Not found" (-13) instead of "Success" (0) -------------------------------------------------------------------------- *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, *** and potentially your MPI job) [gr10b17n08:143134] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed! srun: error: gr10b17n07: task 2: Exited with exit code 1 srun: error: gr10b17n06: task 1: Exited with exit code 1 srun: error: gr10b17n08: task 3: Exited with exit code 1 c315@gr10pbs2 /bw1nfs1/Projetos1/c315/Meus_testes > Slurm information: c315@gr10pbs2 /bw1nfs1/Projetos1/c315/Meus_testes > srun --mpi=list srun: MPI types are... srun: pmix_v3 srun: none srun: pmi2 srun: pmix The compilation lines used for PMIx and openmpi: MAKEFLAGS="-j24 V=99" rpmbuild -ba --define 'install_in_opt 0' --define "configure_options --enable-shared --enable-static --with-jansson=/usr --with-libevent=/usr --with-libevent-libdir=/usr/lib64 --with-hwloc=/usr --with-curl=/usr --without-opamgt --with-munge=/usr --with-lustre=/usr --enable-pmix-timing --enable-pmi-backward-compatibility --enable-pmix-binaries --with-devel-headers --with-tests-examples --disable-mca-dso --disable-weak-symbols AR=/tgdesenv/dist/compiladores/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/xiar LD=/tgdesenv/dist/compiladores/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/xild CC=/tgdesenv/dist/compiladores/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/icc FC=/tgdesenv/dist/compiladores/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/ifort F90=/tgdesenv/dist/compiladores/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/ifort F77=/tgdesenv/dist/compiladores/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/ifort CXX=/tgdesenv/dist/compiladores/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/icpc LDFLAGS='-Wc,-static-intel -static-intel' CFLAGS=-O3 FCFLAGS=-O3 F77FLAGS=-O3 F90FLAGS=-O3 CXXFLAGS=-O3 MFLAGS='-j24 V99'" pmix-3.1.5.spec MAKEFLAGS="-j24 V=99" rpmbuild -ba --define 'install_in_opt 0' --define "configure_options --enable-shared --enable-static --with-libevent=/usr --with-libevent-libdir=/usr/lib64 --with-pmix=/usr --with-pmix-libdir=/usr/lib64 --enable-install-libpmix --with-ompi-pmix-rte --without-orte --with-slurm --with-ucx=/usr --with-cuda=/usr/local/cuda --with-gdrcopy=/usr --with-hwloc --enable-mpi-cxx --disable-mca-dso --enable-mpi-fortran --disable-weak-symbols --enable-mpi-thread-multiple --enable-contrib-no-build=vt --enable-mpirun-prefix-by-default --enable-orterun-prefix-by-default --with-cuda=/usr/local/cuda AR=/tgdesenv/dist/compiladores/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/xiar LD=/tgdesenv/dist/compiladores/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/xild CC=/tgdesenv/dist/compiladores/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/icc FC=/tgdesenv/dist/compiladores/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/ifort F90=/tgdesenv/dist/compiladores/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/ifort F77=/tgdesenv/dist/compiladores/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/ifort CXX=/tgdesenv/dist/compiladores/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/icpc LDFLAGS='-Wc,-static-intel -static-intel' CFLAGS=-O3 FCFLAGS=-O3 F77FLAGS=-O3 F90FLAGS=-O3 CXXFLAGS=-O3 MFLAGS='-j24 V99'" openmpi-4.0.2.spec 2>&1 | tee /root/openmpi-2.log --- Leandro -- Jeff Squyres jsquy...@cisco.com <mailto:jsquy...@cisco.com>