Can you please rebuild OMPI with -enable-debug in the configure cmd? It will let us see more error output
> On Apr 21, 2016, at 8:52 AM, Siegmar Gross > <siegmar.gr...@informatik.hs-fulda.de> wrote: > > Hi Ralph, > > I don't see any additional information. > > tyr hello_1 108 mpiexec -np 4 --host tyr,sunpc1,linpc1,ruester -mca > mca_base_component_show_load_errors 1 hello_1_mpi > [tyr.informatik.hs-fulda.de:06211] [[48741,0],0] ORTE_ERROR_LOG: Not found in > file > ../../../../../openmpi-v2.x-dev-1280-gc110ae8/orte/mca/ess/hnp/ess_hnp_module.c > at line 638 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > opal_pmix_base_select failed > --> Returned value Not found (-13) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > > > tyr hello_1 109 mpiexec -np 4 --host tyr,sunpc1,linpc1,ruester -mca > mca_base_component_show_load_errors 1 -mca pmix_base_verbose 10 -mca > pmix_server_verbose 5 hello_1_mpi > [tyr.informatik.hs-fulda.de:06212] mca: base: components_register: > registering framework pmix components > [tyr.informatik.hs-fulda.de:06212] mca: base: components_open: opening pmix > components > [tyr.informatik.hs-fulda.de:06212] mca:base:select: Auto-selecting pmix > components > [tyr.informatik.hs-fulda.de:06212] mca:base:select:( pmix) No component > selected! > [tyr.informatik.hs-fulda.de:06212] [[48738,0],0] ORTE_ERROR_LOG: Not found in > file > ../../../../../openmpi-v2.x-dev-1280-gc110ae8/orte/mca/ess/hnp/ess_hnp_module.c > at line 638 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > opal_pmix_base_select failed > --> Returned value Not found (-13) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > tyr hello_1 110 > > > Kind regards > > Siegmar > > > Am 21.04.2016 um 17:24 schrieb Ralph Castain: >> Hmmm…it looks like you built the right components, but they are not being >> picked up. Can you run your mpiexec command again, adding “-mca >> mca_base_component_show_load_errors 1” to the cmd line? >> >> >>> On Apr 21, 2016, at 8:16 AM, Siegmar Gross >>> <siegmar.gr...@informatik.hs-fulda.de> wrote: >>> >>> Hi Ralph, >>> >>> I have attached ompi_info output for both compilers from my >>> sparc machine and the listings for both compilers from the >>> <prefix>/lib/openmpi directories. Hopefully that helps to >>> find the problem. >>> >>> hermes tmp 3 tar zvft openmpi-2.x_info.tar.gz >>> -rw-r--r-- root/root 10969 2016-04-21 17:06 ompi_info_SunOS_sparc_cc.txt >>> -rw-r--r-- root/root 11044 2016-04-21 17:06 >>> ompi_info_SunOS_sparc_gcc.txt >>> -rw-r--r-- root/root 71252 2016-04-21 17:02 lib64_openmpi.txt >>> hermes tmp 4 >>> >>> >>> Kind regards and thank you very much once more for your help >>> >>> Siegmar >>> >>> >>> Am 21.04.2016 um 15:54 schrieb Ralph Castain: >>>> Odd - it would appear that none of the pmix components built? Can you send >>>> along the output from ompi_info? Or just send a listing of the files in the >>>> <prefix>/lib/openmpi directory? >>>> >>>> >>>>> On Apr 21, 2016, at 1:27 AM, Siegmar Gross >>>>> <siegmar.gr...@informatik.hs-fulda.de >>>>> <mailto:siegmar.gr...@informatik.hs-fulda.de>> wrote: >>>>> >>>>> Hi Ralph, >>>>> >>>>> Am 21.04.2016 um 00:18 schrieb Ralph Castain: >>>>>> Could you please rerun these test and add “-mca pmix_base_verbose 10 >>>>>> -mca pmix_server_verbose 5” to your cmd line? I need to see why the >>>>>> pmix components failed. >>>>> >>>>> >>>>> tyr spawn 111 mpiexec -np 1 --host tyr,sunpc1,linpc1,ruester -mca >>>>> pmix_base_verbose 10 -mca pmix_server_verbose 5 spawn_multiple_master >>>>> [tyr.informatik.hs-fulda.de <http://tyr.informatik.hs-fulda.de/>:26652] >>>>> mca: >>>>> base: components_register: registering framework pmix components >>>>> [tyr.informatik.hs-fulda.de <http://tyr.informatik.hs-fulda.de/>:26652] >>>>> mca: >>>>> base: components_open: opening pmix components >>>>> [tyr.informatik.hs-fulda.de <http://tyr.informatik.hs-fulda.de/>:26652] >>>>> mca:base:select: Auto-selecting pmix components >>>>> [tyr.informatik.hs-fulda.de <http://tyr.informatik.hs-fulda.de/>:26652] >>>>> mca:base:select:( pmix) No component selected! >>>>> [tyr.informatik.hs-fulda.de <http://tyr.informatik.hs-fulda.de/>:26652] >>>>> [[52794,0],0] ORTE_ERROR_LOG: Not found in file >>>>> ../../../../../openmpi-v2.x-dev-1280-gc110ae8/orte/mca/ess/hnp/ess_hnp_module.c >>>>> at line 638 >>>>> -------------------------------------------------------------------------- >>>>> It looks like orte_init failed for some reason; your parallel process is >>>>> likely to abort. There are many reasons that a parallel process can >>>>> fail during orte_init; some of which are due to configuration or >>>>> environment problems. This failure appears to be an internal failure; >>>>> here's some additional information (which may only be relevant to an >>>>> Open MPI developer): >>>>> >>>>> opal_pmix_base_select failed >>>>> --> Returned value Not found (-13) instead of ORTE_SUCCESS >>>>> -------------------------------------------------------------------------- >>>>> tyr spawn 112 >>>>> >>>>> >>>>> >>>>> >>>>> tyr hello_1 116 mpiexec -np 1 --host tyr,sunpc1,linpc1,ruester -mca >>>>> pmix_base_verbose 10 -mca pmix_server_verbose 5 hello_1_mpi >>>>> [tyr.informatik.hs-fulda.de <http://tyr.informatik.hs-fulda.de/>:27261] >>>>> mca: >>>>> base: components_register: registering framework pmix components >>>>> [tyr.informatik.hs-fulda.de <http://tyr.informatik.hs-fulda.de/>:27261] >>>>> mca: >>>>> base: components_open: opening pmix components >>>>> [tyr.informatik.hs-fulda.de <http://tyr.informatik.hs-fulda.de/>:27261] >>>>> mca:base:select: Auto-selecting pmix components >>>>> [tyr.informatik.hs-fulda.de <http://tyr.informatik.hs-fulda.de/>:27261] >>>>> mca:base:select:( pmix) No component selected! >>>>> [tyr.informatik.hs-fulda.de <http://tyr.informatik.hs-fulda.de/>:27261] >>>>> [[52315,0],0] ORTE_ERROR_LOG: Not found in file >>>>> ../../../../../openmpi-v2.x-dev-1280-gc110ae8/orte/mca/ess/hnp/ess_hnp_module.c >>>>> at line 638 >>>>> -------------------------------------------------------------------------- >>>>> It looks like orte_init failed for some reason; your parallel process is >>>>> likely to abort. There are many reasons that a parallel process can >>>>> fail during orte_init; some of which are due to configuration or >>>>> environment problems. This failure appears to be an internal failure; >>>>> here's some additional information (which may only be relevant to an >>>>> Open MPI developer): >>>>> >>>>> opal_pmix_base_select failed >>>>> --> Returned value Not found (-13) instead of ORTE_SUCCESS >>>>> -------------------------------------------------------------------------- >>>>> tyr hello_1 117 >>>>> >>>>> >>>>> >>>>> Thank you very much for your help. >>>>> >>>>> >>>>> Kind regards >>>>> >>>>> Siegmar >>>>> >>>>> >>>>> >>>>>> >>>>>> Thanks >>>>>> Ralph >>>>>> >>>>>>> On Apr 20, 2016, at 10:12 AM, Siegmar Gross >>>>>>> <siegmar.gr...@informatik.hs-fulda.de >>>>>>> <mailto:siegmar.gr...@informatik.hs-fulda.de>> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I have built openmpi-v2.x-dev-1280-gc110ae8 on my machines >>>>>>> (Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE Linux >>>>>>> 12.1 x86_64) with gcc-5.1.0 and Sun C 5.13. Unfortunately I get >>>>>>> runtime errors for some programs. >>>>>>> >>>>>>> >>>>>>> Sun C 5.13: >>>>>>> =========== >>>>>>> >>>>>>> For all my test programs I get the same error on Solaris Sparc and >>>>>>> Solaris x86_64, while the programs work fine on Linux. >>>>>>> >>>>>>> tyr hello_1 115 mpiexec -np 2 hello_1_mpi >>>>>>> [tyr.informatik.hs-fulda.de <http://tyr.informatik.hs-fulda.de>:22373] >>>>>>> [[61763,0],0] ORTE_ERROR_LOG: Not found in file >>>>>>> ../../../../../openmpi-v2.x-dev-1280-gc110ae8/orte/mca/ess/hnp/ess_hnp_module.c >>>>>>> at line 638 >>>>>>> -------------------------------------------------------------------------- >>>>>>> It looks like orte_init failed for some reason; your parallel process is >>>>>>> likely to abort. There are many reasons that a parallel process can >>>>>>> fail during orte_init; some of which are due to configuration or >>>>>>> environment problems. This failure appears to be an internal failure; >>>>>>> here's some additional information (which may only be relevant to an >>>>>>> Open MPI developer): >>>>>>> >>>>>>> opal_pmix_base_select failed >>>>>>> --> Returned value Not found (-13) instead of ORTE_SUCCESS >>>>>>> -------------------------------------------------------------------------- >>>>>>> tyr hello_1 116 >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> GCC-5.1.0: >>>>>>> ========== >>>>>>> >>>>>>> tyr spawn 121 mpiexec -np 1 --host tyr,sunpc1,linpc1,ruester >>>>>>> spawn_multiple_master >>>>>>> >>>>>>> Parent process 0 running on tyr.informatik.hs-fulda.de >>>>>>> <http://tyr.informatik.hs-fulda.de> >>>>>>> I create 3 slave processes. >>>>>>> >>>>>>> [tyr.informatik.hs-fulda.de <http://tyr.informatik.hs-fulda.de>:25366] >>>>>>> PMIX ERROR: UNPACK-PAST-END in file >>>>>>> ../../../../../../openmpi-v2.x-dev-1280-gc110ae8/opal/mca/pmix/pmix112/pmix/src/server/pmix_server_ops.c >>>>>>> at line 829 >>>>>>> [tyr.informatik.hs-fulda.de <http://tyr.informatik.hs-fulda.de>:25366] >>>>>>> PMIX ERROR: UNPACK-PAST-END in file >>>>>>> ../../../../../../openmpi-v2.x-dev-1280-gc110ae8/opal/mca/pmix/pmix112/pmix/src/server/pmix_server.c >>>>>>> at line 2176 >>>>>>> [tyr:25377] *** An error occurred in MPI_Comm_spawn_multiple >>>>>>> [tyr:25377] *** reported by process [3308257281,0] >>>>>>> [tyr:25377] *** on communicator MPI_COMM_WORLD >>>>>>> [tyr:25377] *** MPI_ERR_SPAWN: could not spawn processes >>>>>>> [tyr:25377] *** MPI_ERRORS_ARE_FATAL (processes in this communicator >>>>>>> will >>>>>>> now abort, >>>>>>> [tyr:25377] *** and potentially your MPI job) >>>>>>> tyr spawn 122 >>>>>>> >>>>>>> >>>>>>> I would be grateful if somebody can fix the problems. Thank you very >>>>>>> much for any help in advance. >>>>>>> >>>>>>> >>>>>>> Kind regards >>>>>>> >>>>>>> Siegmar >>>>>>> <hello_1_mpi.c><spawn_multiple_master.c>_______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/users/2016/04/28983.php >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> Link to this >>>>>> post: http://www.open-mpi.org/community/lists/users/2016/04/28986.php >>>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this >>>>> post: http://www.open-mpi.org/community/lists/users/2016/04/28987.php >>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2016/04/28988.php >>>> >>> <openmpi-2.x_info.tar.gz>_______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/04/28989.php >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/04/28990.php >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/04/28991.php