Hi Gilles, I don't know what happened, but the files are not available now and they were definitely available when I answered the email from Ralph. The files also have a different timestamp now. This is an extract from my email to Ralph for Solaris Sparc.
-rwxr-xr-x 1 root root 977 Apr 19 19:49 mca_plm_rsh.la -rwxr-xr-x 1 root root 153280 Apr 19 19:49 mca_plm_rsh.so -rwxr-xr-x 1 root root 1007 Apr 19 19:47 mca_pmix_pmix112.la -rwxr-xr-x 1 root root 1400512 Apr 19 19:47 mca_pmix_pmix112.so -rwxr-xr-x 1 root root 971 Apr 19 19:52 mca_pml_cm.la -rwxr-xr-x 1 root root 342440 Apr 19 19:52 mca_pml_cm.so Now I have the following output for these files. -rwxr-xr-x 1 root root 976 Apr 19 19:58 mca_plm_rsh.la -rwxr-xr-x 1 root root 319816 Apr 19 19:58 mca_plm_rsh.so -rwxr-xr-x 1 root root 970 Apr 19 20:00 mca_pml_cm.la -rwxr-xr-x 1 root root 1507440 Apr 19 20:00 mca_pml_cm.so I'll try to find out what happened next week when I'm back in my office. Kind regards Siegmar Am 23.04.16 um 02:12 schrieb Gilles Gouaillardet:
Siegmar, I will try to reproduce this on my solaris11 x86_64 vm In the mean time, can you please double check mca_pmix_pmix_pmix112.so is a 64 bits library ? (E.g, confirm "-m64" was correctly passed to pmix) Cheers, Gilles On Friday, April 22, 2016, Siegmar Gross <siegmar.gr...@informatik.hs-fulda.de <mailto:siegmar.gr...@informatik.hs-fulda.de>> wrote: Hi Ralph, I've already used "-enable-debug". "SYSTEM_ENV" is "SunOS" or "Linux" and "MACHINE_ENV" is "sparc" or "x86_84". mkdir openmpi-v2.x-dev-1280-gc110ae8-${SYSTEM_ENV}.${MACHINE_ENV}.64_gcc cd openmpi-v2.x-dev-1280-gc110ae8-${SYSTEM_ENV}.${MACHINE_ENV}.64_gcc ../openmpi-v2.x-dev-1280-gc110ae8/configure \ --prefix=/usr/local/openmpi-2.0.0_64_gcc \ --libdir=/usr/local/openmpi-2.0.0_64_gcc/lib64 \ --with-jdk-bindir=/usr/local/jdk1.8.0/bin \ --with-jdk-headers=/usr/local/jdk1.8.0/include \ JAVA_HOME=/usr/local/jdk1.8.0 \ LDFLAGS="-m64" CC="gcc" CXX="g++" FC="gfortran" \ CFLAGS="-m64" CXXFLAGS="-m64" FCFLAGS="-m64" \ CPP="cpp" CXXCPP="cpp" \ --enable-mpi-cxx \ --enable-cxx-exceptions \ --enable-mpi-java \ --enable-heterogeneous \ --enable-mpi-thread-multiple \ --with-hwloc=internal \ --without-verbs \ --with-wrapper-cflags="-std=c11 -m64" \ --with-wrapper-cxxflags="-m64" \ --with-wrapper-fcflags="-m64" \ --enable-debug \ |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_gcc mkdir openmpi-v2.x-dev-1280-gc110ae8-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc cd openmpi-v2.x-dev-1280-gc110ae8-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc ../openmpi-v2.x-dev-1280-gc110ae8/configure \ --prefix=/usr/local/openmpi-2.0.0_64_cc \ --libdir=/usr/local/openmpi-2.0.0_64_cc/lib64 \ --with-jdk-bindir=/usr/local/jdk1.8.0/bin \ --with-jdk-headers=/usr/local/jdk1.8.0/include \ JAVA_HOME=/usr/local/jdk1.8.0 \ LDFLAGS="-m64" CC="cc" CXX="CC" FC="f95" \ CFLAGS="-m64" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \ CPP="cpp" CXXCPP="cpp" \ --enable-mpi-cxx \ --enable-cxx-exceptions \ --enable-mpi-java \ --enable-heterogeneous \ --enable-mpi-thread-multiple \ --with-hwloc=internal \ --without-verbs \ --with-wrapper-cflags="-m64" \ --with-wrapper-cxxflags="-m64 -library=stlport4" \ --with-wrapper-fcflags="-m64" \ --with-wrapper-ldflags="" \ --enable-debug \ |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc Kind regards Siegmar Am 21.04.2016 um 18:18 schrieb Ralph Castain: Can you please rebuild OMPI with -enable-debug in the configure cmd? It will let us see more error output On Apr 21, 2016, at 8:52 AM, Siegmar Gross <siegmar.gr...@informatik.hs-fulda.de> wrote: Hi Ralph, I don't see any additional information. tyr hello_1 108 mpiexec -np 4 --host tyr,sunpc1,linpc1,ruester -mca mca_base_component_show_load_errors 1 hello_1_mpi [tyr.informatik.hs-fulda.de:06211 <http://tyr.informatik.hs-fulda.de:06211>] [[48741,0],0] ORTE_ERROR_LOG: Not found in file ../../../../../openmpi-v2.x-dev-1280-gc110ae8/orte/mca/ess/hnp/ess_hnp_module.c at line 638 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_pmix_base_select failed --> Returned value Not found (-13) instead of ORTE_SUCCESS -------------------------------------------------------------------------- tyr hello_1 109 mpiexec -np 4 --host tyr,sunpc1,linpc1,ruester -mca mca_base_component_show_load_errors 1 -mca pmix_base_verbose 10 -mca pmix_server_verbose 5 hello_1_mpi [tyr.informatik.hs-fulda.de:06212 <http://tyr.informatik.hs-fulda.de:06212>] mca: base: components_register: registering framework pmix components [tyr.informatik.hs-fulda.de:06212 <http://tyr.informatik.hs-fulda.de:06212>] mca: base: components_open: opening pmix components [tyr.informatik.hs-fulda.de:06212 <http://tyr.informatik.hs-fulda.de:06212>] mca:base:select: Auto-selecting pmix components [tyr.informatik.hs-fulda.de:06212 <http://tyr.informatik.hs-fulda.de:06212>] mca:base:select:( pmix) No component selected! [tyr.informatik.hs-fulda.de:06212 <http://tyr.informatik.hs-fulda.de:06212>] [[48738,0],0] ORTE_ERROR_LOG: Not found in file ../../../../../openmpi-v2.x-dev-1280-gc110ae8/orte/mca/ess/hnp/ess_hnp_module.c at line 638 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_pmix_base_select failed --> Returned value Not found (-13) instead of ORTE_SUCCESS -------------------------------------------------------------------------- tyr hello_1 110 Kind regards Siegmar Am 21.04.2016 um 17:24 schrieb Ralph Castain: Hmmm…it looks like you built the right components, but they are not being picked up. Can you run your mpiexec command again, adding “-mca mca_base_component_show_load_errors 1” to the cmd line? On Apr 21, 2016, at 8:16 AM, Siegmar Gross <siegmar.gr...@informatik.hs-fulda.de> wrote: Hi Ralph, I have attached ompi_info output for both compilers from my sparc machine and the listings for both compilers from the <prefix>/lib/openmpi directories. Hopefully that helps to find the problem. hermes tmp 3 tar zvft openmpi-2.x_info.tar.gz -rw-r--r-- root/root 10969 2016-04-21 17:06 ompi_info_SunOS_sparc_cc.txt -rw-r--r-- root/root 11044 2016-04-21 17:06 ompi_info_SunOS_sparc_gcc.txt -rw-r--r-- root/root 71252 2016-04-21 17:02 lib64_openmpi.txt hermes tmp 4 Kind regards and thank you very much once more for your help Siegmar Am 21.04.2016 um 15:54 schrieb Ralph Castain: Odd - it would appear that none of the pmix components built? Can you send along the output from ompi_info? Or just send a listing of the files in the <prefix>/lib/openmpi directory? On Apr 21, 2016, at 1:27 AM, Siegmar Gross <siegmar.gr...@informatik.hs-fulda.de <mailto:siegmar.gr...@informatik.hs-fulda.de>> wrote: Hi Ralph, Am 21.04.2016 um 00:18 schrieb Ralph Castain: Could you please rerun these test and add “-mca pmix_base_verbose 10 -mca pmix_server_verbose 5” to your cmd line? I need to see why the pmix components failed. tyr spawn 111 mpiexec -np 1 --host tyr,sunpc1,linpc1,ruester -mca pmix_base_verbose 10 -mca pmix_server_verbose 5 spawn_multiple_master [tyr.informatik.hs-fulda.de <http://tyr.informatik.hs-fulda.de> <http://tyr.informatik.hs-fulda.de/>:26652] mca: base: components_register: registering framework pmix components [tyr.informatik.hs-fulda.de <http://tyr.informatik.hs-fulda.de> <http://tyr.informatik.hs-fulda.de/>:26652] mca: base: components_open: opening pmix components [tyr.informatik.hs-fulda.de <http://tyr.informatik.hs-fulda.de> <http://tyr.informatik.hs-fulda.de/>:26652] mca:base:select: Auto-selecting pmix components [tyr.informatik.hs-fulda.de <http://tyr.informatik.hs-fulda.de> <http://tyr.informatik.hs-fulda.de/>:26652] mca:base:select:( pmix) No component selected! [tyr.informatik.hs-fulda.de <http://tyr.informatik.hs-fulda.de> <http://tyr.informatik.hs-fulda.de/>:26652] [[52794,0],0] ORTE_ERROR_LOG: Not found in file ../../../../../openmpi-v2.x-dev-1280-gc110ae8/orte/mca/ess/hnp/ess_hnp_module.c at line 638 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_pmix_base_select failed --> Returned value Not found (-13) instead of ORTE_SUCCESS -------------------------------------------------------------------------- tyr spawn 112 tyr hello_1 116 mpiexec -np 1 --host tyr,sunpc1,linpc1,ruester -mca pmix_base_verbose 10 -mca pmix_server_verbose 5 hello_1_mpi [tyr.informatik.hs-fulda.de <http://tyr.informatik.hs-fulda.de> <http://tyr.informatik.hs-fulda.de/>:27261] mca: base: components_register: registering framework pmix components [tyr.informatik.hs-fulda.de <http://tyr.informatik.hs-fulda.de> <http://tyr.informatik.hs-fulda.de/>:27261] mca: base: components_open: opening pmix components [tyr.informatik.hs-fulda.de <http://tyr.informatik.hs-fulda.de> <http://tyr.informatik.hs-fulda.de/>:27261] mca:base:select: Auto-selecting pmix components [tyr.informatik.hs-fulda.de <http://tyr.informatik.hs-fulda.de> <http://tyr.informatik.hs-fulda.de/>:27261] mca:base:select:( pmix) No component selected! [tyr.informatik.hs-fulda.de <http://tyr.informatik.hs-fulda.de> <http://tyr.informatik.hs-fulda.de/>:27261] [[52315,0],0] ORTE_ERROR_LOG: Not found in file ../../../../../openmpi-v2.x-dev-1280-gc110ae8/orte/mca/ess/hnp/ess_hnp_module.c at line 638 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_pmix_base_select failed --> Returned value Not found (-13) instead of ORTE_SUCCESS -------------------------------------------------------------------------- tyr hello_1 117 Thank you very much for your help. Kind regards Siegmar Thanks Ralph On Apr 20, 2016, at 10:12 AM, Siegmar Gross <siegmar.gr...@informatik.hs-fulda.de <mailto:siegmar.gr...@informatik.hs-fulda.de>> wrote: Hi, I have built openmpi-v2.x-dev-1280-gc110ae8 on my machines (Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with gcc-5.1.0 and Sun C 5.13. Unfortunately I get runtime errors for some programs. Sun C 5.13: =========== For all my test programs I get the same error on Solaris Sparc and Solaris x86_64, while the programs work fine on Linux. tyr hello_1 115 mpiexec -np 2 hello_1_mpi [tyr.informatik.hs-fulda.de <http://tyr.informatik.hs-fulda.de> <http://tyr.informatik.hs-fulda.de>:22373] [[61763,0],0] ORTE_ERROR_LOG: Not found in file ../../../../../openmpi-v2.x-dev-1280-gc110ae8/orte/mca/ess/hnp/ess_hnp_module.c at line 638 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_pmix_base_select failed --> Returned value Not found (-13) instead of ORTE_SUCCESS -------------------------------------------------------------------------- tyr hello_1 116 GCC-5.1.0: ========== tyr spawn 121 mpiexec -np 1 --host tyr,sunpc1,linpc1,ruester spawn_multiple_master Parent process 0 running on tyr.informatik.hs-fulda.de <http://tyr.informatik.hs-fulda.de> <http://tyr.informatik.hs-fulda.de> I create 3 slave processes. [tyr.informatik.hs-fulda.de <http://tyr.informatik.hs-fulda.de> <http://tyr.informatik.hs-fulda.de>:25366] PMIX ERROR: UNPACK-PAST-END in file ../../../../../../openmpi-v2.x-dev-1280-gc110ae8/opal/mca/pmix/pmix112/pmix/src/server/pmix_server_ops.c at line 829 [tyr.informatik.hs-fulda.de <http://tyr.informatik.hs-fulda.de> <http://tyr.informatik.hs-fulda.de>:25366] PMIX ERROR: UNPACK-PAST-END in file ../../../../../../openmpi-v2.x-dev-1280-gc110ae8/opal/mca/pmix/pmix112/pmix/src/server/pmix_server.c at line 2176 [tyr:25377] *** An error occurred in MPI_Comm_spawn_multiple [tyr:25377] *** reported by process [3308257281,0] [tyr:25377] *** on communicator MPI_COMM_WORLD [tyr:25377] *** MPI_ERR_SPAWN: could not spawn processes [tyr:25377] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, [tyr:25377] *** and potentially your MPI job) tyr spawn 122 I would be grateful if somebody can fix the problems. Thank you very much for any help in advance. Kind regards Siegmar <hello_1_mpi.c><spawn_multiple_master.c>_______________________________________________ users mailing list us...@open-mpi.org <mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2016/04/28983.php _______________________________________________ users mailing list us...@open-mpi.org <mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2016/04/28986.php _______________________________________________ users mailing list us...@open-mpi.org <mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2016/04/28987.php _______________________________________________ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2016/04/28988.php <openmpi-2.x_info.tar.gz>_______________________________________________ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2016/04/28989.php _______________________________________________ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2016/04/28990.php _______________________________________________ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2016/04/28991.php _______________________________________________ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2016/04/28992.php _______________________________________________ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2016/04/28993.php _______________________________________________ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2016/04/28999.php