HI, I compile it statically to make sure compilers libraries will not be a dependency, and I do this way for years. The developers said they want this way, so I did.
I saw this warning and this library is related with omnipath, which we don't have. --- *Leandro* On Tue, May 12, 2020 at 8:27 AM Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote: > It looks like you are building both static and dynamic libraries > (--enable-static and --enable-shared). This might be confusing the issue > -- I can see at least one warning: > > icc: warning #10237: -lcilkrts linked in dynamically, static library not > available > > > It's not easy to tell from the snippets you sent what other downstream > side effects this might have. > > Is there a reason to compile statically? It generally leads to (much) > bigger executables, and far less memory efficiency (i.e., the library is > not shared in memory between all the MPI processes running on each node). > Also, the link phase of compilers tends to prefer shared libraries, so > unless your apps are compiled/linked with whatever the compiler's "link > this statically" flags are, it's going to likely default to using the > shared libraries. > > This is a long way of saying: try building everything with just > --enable-shared (and not --enable-static). Or possibly just remove both > flags; --enable-shared is the default. > > > > > On May 11, 2020, at 9:23 AM, Leandro via users <users@lists.open-mpi.org> > wrote: > > Hi, > > I'm trying to start using Slurm, and I followed all the instructions ti > build PMIx, Slurm using pmix, but I can't make openmpi to work. > > According to PMIx documentation, I should compile openmpi using > "--with-ompi-pmix-rte" but when I tried, It fails. I need to build this as > CentOS rpms. > > Thanks in advance for your help. I pasted some info below. > > libtool: link: > /tgdesenv/dist/compiladores/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/icc > -std=gnu99 -std=gnu99 -DOPAL_CONFIGURE_USER=\"root\" > -DOPAL_CONFIGURE_HOST=\"gr10b17n05\" "-DOPAL_CONFIGURE_DATE=\"Fri May 8 > 13:35:51 -03 2020\"" -DOMPI_BUILD_USER=\"root\" > -DOMPI_BUILD_HOST=\"gr10b17n05\" "-DOMPI_BUILD_DATE=\"Fri May 8 13:47:32 > -03 2020\"" "-DOMPI_BUILD_CFLAGS=\"-DNDEBUG -O3 -finline-functions > -fno-strict-aliasing -restrict -Qoption,cpp,--extended_float_types > -pthread\"" "-DOMPI_BUILD_CPPFLAGS=\"-I../../.. -I../../../orte/include > \"" "-DOMPI_BUILD_CXXFLAGS=\"-DNDEBUG -O3 -finline-functions -pthread\"" > "-DOMPI_BUILD_CXXCPPFLAGS=\"-I../../.. \"" "-DOMPI_BUILD_FFLAGS=\"-O2 -g > -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong > --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic > -I/usr/lib64/gfortran/modules\"" -DOMPI_BUILD_FCFLAGS=\"-O3\" > "-DOMPI_BUILD_LDFLAGS=\"-Wc,-static-intel -static-intel -L/usr/lib64\"" > "-DOMPI_BUILD_LIBS=\"-lrt -lutil -lz -lhwloc -levent -levent_pthreads\"" > -DOPAL_CC_ABSOLUTE=\"\" -DOMPI_CXX_ABSOLUTE=\"none\" -DNDEBUG -O3 > -finline-functions -fno-strict-aliasing -restrict > -Qoption,cpp,--extended_float_types -pthread -static-intel -static-intel -o > .libs/ompi_info ompi_info.o param.o -L/usr/lib64 > ../../../ompi/.libs/libmpi.so -L/usr/lib -llustreapi > /root/rpmbuild/BUILD/openmpi-4.0.2/opal/.libs/libopen-pal.so > ../../../opal/.libs/libopen-pal.so -lfabric -lucp -lucm -lucs -luct > -lrdmacm -libverbs /usr/lib64/libpmix.so -lmunge -lrt -lutil -lz > /usr/lib64/libhwloc.so -lm -ludev -lltdl -levent -levent_pthreads -pthread > -Wl,-rpath -Wl,/usr/lib64 > icc: warning #10237: -lcilkrts linked in dynamically, static library not > available > ../../../ompi/.libs/libmpi.so: undefined reference to `orte_process_info' > ../../../ompi/.libs/libmpi.so: undefined reference to `orte_show_help' > make[2]: *** [ompi_info] Error 1 > make[2]: Leaving directory > `/root/rpmbuild/BUILD/openmpi-4.0.2/ompi/tools/ompi_info' > make[1]: *** [all-recursive] Error 1 > make[1]: Leaving directory `/root/rpmbuild/BUILD/openmpi-4.0.2/ompi' > make: *** [all-recursive] Error 1 > error: Bad exit status from /var/tmp/rpm-tmp.RyklCR (%build) > > The orte libraries are missing. When I don't use "-with-ompi-pmix-rte" it > builds, but neither mpirun or srun works: > > c315@gr10b17n05 /bw1nfs1/Projetos1/c315/Meus_testes > cat machine_file > gr10b17n05 > gr10b17n06 > gr10b17n07 > gr10b17n08 > c315@gr10b17n05 /bw1nfs1/Projetos1/c315/Meus_testes > mpirun -machinefile > machine_file ./mpihello > [gr10b17n07:115065] [[21391,0],2] ORTE_ERROR_LOG: Not found in file > base/ess_base_std_orted.c at line 362 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > opal_pmix_base_select failed > --> Returned value Not found (-13) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > ORTE was unable to reliably start one or more daemons. > This usually is caused by: > > * not finding the required libraries and/or binaries on > one or more nodes. Please check your PATH and LD_LIBRARY_PATH > settings, or configure OMPI with --enable-orterun-prefix-by-default > > * lack of authority to execute on one or more specified nodes. > Please verify your allocation and authorities. > > * the inability to write startup files into /tmp > (--tmpdir/orte_tmpdir_base). > Please check with your sys admin to determine the correct location to > use. > > * compilation of the orted with dynamic libraries when static are required > (e.g., on Cray). Please check your configure cmd line and consider using > one of the contrib/platform definitions for your system type. > > * an inability to create a connection back to mpirun due to a > lack of common network interfaces and/or no route found between > them. Please check network connectivity (including firewalls > and network routing requirements). > -------------------------------------------------------------------------- > [gr10b17n08:142030] [[21391,0],3] ORTE_ERROR_LOG: Not found in file > base/ess_base_std_orted.c at line 362 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > opal_pmix_base_select failed > --> Returned value Not found (-13) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > ORTE does not know how to route a message to the specified daemon > located on the indicated node: > > my node: gr10b17n05 > target node: gr10b17n06 > > This is usually an internal programming error that should be > reported to the developers. In the meantime, a workaround may > be to set the MCA param routed=direct on the command line or > in your environment. We apologize for the problem. > -------------------------------------------------------------------------- > [gr10b17n05:171586] 1 more process has sent help message > help-errmgr-base.txt / no-path > [gr10b17n05:171586] Set MCA parameter "orte_base_help_aggregate" to 0 to > see all help / error messages > c315@gr10b17n05 /bw1nfs1/Projetos1/c315/Meus_testes > > > -------------------------- > c315@gr10pbs2 /bw1nfs1/Projetos1/c315/Meus_testes > mpirun --nolocal -np > 1 --machinefile machine_file mpihello > [gr10pbs2:242828] [[60566,0],0] ORTE_ERROR_LOG: Not found in file > ess_hnp_module.c at line 320 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > opal_pmix_base_select failed > --> Returned value Not found (-13) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > c315@gr10pbs2 /bw1nfs1/Projetos1/c315/Meus_testes > mpirun --nolocal -np > 1 --machinefile machine_file mpihello > [gr10pbs2:237314] [[50968,0],0] ORTE_ERROR_LOG: Not found in file > ess_hnp_module.c at line 320 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > opal_pmix_base_select failed > --> Returned value Not found (-13) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > c315@gr10pbs2 /bw1nfs1/Projetos1/c315/Meus_testes > > > c315@gr10pbs2 /bw1nfs1/Projetos1/c315/Meus_testes > srun -N4 > /bw1nfs1/Projetos1/c315/Meus_testes/mpihello > *** An error occurred in MPI_Init > *** on a NULL communicator > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, > *** and potentially your MPI job) > [gr10b17n05:172693] Local abort before MPI_INIT completed completed > successfully, but am not able to aggregate error messages, and not able to > guarantee that all other processes were killed! > srun: error: gr10b17n05: task 0: Exited with exit code 1 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > getting job size failed > --> Returned value Not found (-13) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > getting job size failed > --> Returned value Not found (-13) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > orte_ess_init failed > --> Returned value Not found (-13) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > orte_ess_init failed > --> Returned value Not found (-13) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > ompi_mpi_init: ompi_rte_init failed > --> Returned "Not found" (-13) instead of "Success" (0) > -------------------------------------------------------------------------- > *** An error occurred in MPI_Init > *** on a NULL communicator > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, > *** and potentially your MPI job) > -------------------------------------------------------------------------- > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > ompi_mpi_init: ompi_rte_init failed > --> Returned "Not found" (-13) instead of "Success" (0) > -------------------------------------------------------------------------- > [gr10b17n07:116175] Local abort before MPI_INIT completed completed > successfully, but am not able to aggregate error messages, and not able to > guarantee that all other processes were killed! > *** An error occurred in MPI_Init > *** on a NULL communicator > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, > *** and potentially your MPI job) > [gr10b17n06:142082] Local abort before MPI_INIT completed completed > successfully, but am not able to aggregate error messages, and not able to > guarantee that all other processes were killed! > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > getting job size failed > --> Returned value Not found (-13) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > orte_ess_init failed > --> Returned value Not found (-13) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > ompi_mpi_init: ompi_rte_init failed > --> Returned "Not found" (-13) instead of "Success" (0) > -------------------------------------------------------------------------- > *** An error occurred in MPI_Init > *** on a NULL communicator > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, > *** and potentially your MPI job) > [gr10b17n08:143134] Local abort before MPI_INIT completed completed > successfully, but am not able to aggregate error messages, and not able to > guarantee that all other processes were killed! > srun: error: gr10b17n07: task 2: Exited with exit code 1 > srun: error: gr10b17n06: task 1: Exited with exit code 1 > srun: error: gr10b17n08: task 3: Exited with exit code 1 > c315@gr10pbs2 /bw1nfs1/Projetos1/c315/Meus_testes > > > Slurm information: > c315@gr10pbs2 /bw1nfs1/Projetos1/c315/Meus_testes > srun --mpi=list > srun: MPI types are... > srun: pmix_v3 > srun: none > srun: pmi2 > srun: pmix > > The compilation lines used for PMIx and openmpi: > > MAKEFLAGS="-j24 V=99" rpmbuild -ba --define 'install_in_opt 0' --define > "configure_options --enable-shared --enable-static --with-jansson=/usr > --with-libevent=/usr --with-libevent-libdir=/usr/lib64 --with-hwloc=/usr > --with-curl=/usr --without-opamgt --with-munge=/usr --with-lustre=/usr > --enable-pmix-timing --enable-pmi-backward-compatibility > --enable-pmix-binaries --with-devel-headers --with-tests-examples > --disable-mca-dso --disable-weak-symbols > AR=/tgdesenv/dist/compiladores/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/xiar > LD=/tgdesenv/dist/compiladores/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/xild > CC=/tgdesenv/dist/compiladores/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/icc > FC=/tgdesenv/dist/compiladores/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/ifort > F90=/tgdesenv/dist/compiladores/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/ifort > F77=/tgdesenv/dist/compiladores/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/ifort > CXX=/tgdesenv/dist/compiladores/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/icpc > LDFLAGS='-Wc,-static-intel -static-intel' CFLAGS=-O3 FCFLAGS=-O3 > F77FLAGS=-O3 F90FLAGS=-O3 CXXFLAGS=-O3 MFLAGS='-j24 V99'" pmix-3.1.5.spec > > MAKEFLAGS="-j24 V=99" rpmbuild -ba --define 'install_in_opt 0' --define > "configure_options --enable-shared --enable-static --with-libevent=/usr > --with-libevent-libdir=/usr/lib64 --with-pmix=/usr > --with-pmix-libdir=/usr/lib64 --enable-install-libpmix --with-ompi-pmix-rte > --without-orte --with-slurm --with-ucx=/usr --with-cuda=/usr/local/cuda > --with-gdrcopy=/usr --with-hwloc --enable-mpi-cxx --disable-mca-dso > --enable-mpi-fortran --disable-weak-symbols --enable-mpi-thread-multiple > --enable-contrib-no-build=vt --enable-mpirun-prefix-by-default > --enable-orterun-prefix-by-default --with-cuda=/usr/local/cuda > AR=/tgdesenv/dist/compiladores/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/xiar > LD=/tgdesenv/dist/compiladores/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/xild > CC=/tgdesenv/dist/compiladores/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/icc > FC=/tgdesenv/dist/compiladores/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/ifort > F90=/tgdesenv/dist/compiladores/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/ifort > F77=/tgdesenv/dist/compiladores/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/ifort > CXX=/tgdesenv/dist/compiladores/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/icpc > LDFLAGS='-Wc,-static-intel -static-intel' CFLAGS=-O3 FCFLAGS=-O3 > F77FLAGS=-O3 F90FLAGS=-O3 CXXFLAGS=-O3 MFLAGS='-j24 V99'" > openmpi-4.0.2.spec 2>&1 | tee /root/openmpi-2.log > > --- > Leandro > > > > -- > Jeff Squyres > jsquy...@cisco.com > >