On https://github.com/open-mpi/ompi/pull/1385 <https://github.com/open-mpi/ompi/pull/1385> Gilles indicated he would update the patch and commit it on Monday
> On Feb 20, 2016, at 12:48 AM, Siegmar Gross > <siegmar.gr...@informatik.hs-fulda.de> wrote: > > Hi Gilles, > > do you know, when fixes for the problems will be ready? They still exist > in the current version. > > > tyr spawn 136 ompi_info | grep -e "Open MPI repo revision" -e "C compiler > absolute" > Open MPI repo revision: v2.x-dev-1108-gaaf15d9 > C compiler absolute: /usr/local/gcc-5.1.0/bin/gcc > > > tyr spawn 137 mpiexec -np 1 --host tyr,sunpc1 spawn_multiple_master > > Parent process 0 running on tyr.informatik.hs-fulda.de > I create 3 slave processes. > > [tyr.informatik.hs-fulda.de:23580] PMIX ERROR: UNPACK-PAST-END in file > ../../../../../../openmpi-v2.x-dev-1108-gaaf15d9/opal/mca/pmix/pmix112/pmix/src/server/pmix_server_ops.c > at line 829 > [tyr.informatik.hs-fulda.de:23580] PMIX ERROR: UNPACK-PAST-END in file > ../../../../../../openmpi-v2.x-dev-1108-gaaf15d9/opal/mca/pmix/pmix112/pmix/src/server/pmix_server.c > at line 2176 > [tyr:23587] *** An error occurred in MPI_Comm_spawn_multiple > [tyr:23587] *** reported by process [4198105089,0] > [tyr:23587] *** on communicator MPI_COMM_WORLD > [tyr:23587] *** MPI_ERR_SPAWN: could not spawn processes > [tyr:23587] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now > abort, > [tyr:23587] *** and potentially your MPI job) > tyr spawn 138 > > > > > tyr spawn 115 ompi_info | grep -e "Open MPI repo revision" -e "C compiler > absolute" > Open MPI repo revision: v2.x-dev-1108-gaaf15d9 > C compiler absolute: /opt/solstudio12.4/bin/cc > > > tyr spawn 116 mpiexec -np 1 --host tyr,sunpc1 spawn_multiple_master > [tyr.informatik.hs-fulda.de:28715] [[54797,0],0] ORTE_ERROR_LOG: Not found in > file > ../../../../../openmpi-v2.x-dev-1108-gaaf15d9/orte/mca/ess/hnp/ess_hnp_module.c > at line 638 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > opal_pmix_base_select failed > --> Returned value Not found (-13) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > tyr spawn 117 > > > > Kind regards > > Siegmar > > > > On 01/15/16 08:03, Gilles Gouaillardet wrote: >> Siegmar, >> >> the fix is now being discussed at https://github.com/open-mpi/ompi/pull/1285 >> >> the other error your reported (MPI_Comm_spawn hanging on an heterogeneous >> cluster) is >> being discussed at https://github.com/open-mpi/ompi/pull/1292 >> >> >> Cheers, >> >> Gilles >> >> On 1/14/2016 11:06 PM, Siegmar Gross wrote: >>> Hi, >>> >>> I've successfully built openmpi-v2.x-dev-958-g7e94425 on my machine >>> (SUSE Linux Enterprise Server 12.0 x86_64) with gcc-5.2.0 and >>> Sun C 5.13. Unfortunately I get a runtime error for all programs >>> if I use the Sun compiler. Most of my small works es expected with >>> the GNU compiler. I used the following command to build the package >>> for cc. >>> >>> >>> mkdir openmpi-v2.x-dev-958-g7e94425-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc >>> cd openmpi-v2.x-dev-958-g7e94425-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc >>> >>> ../openmpi-v2.x-dev-958-g7e94425/configure \ >>> --prefix=/usr/local/openmpi-2.0.0_64_cc \ >>> --libdir=/usr/local/openmpi-2.0.0_64_cc/lib64 \ >>> --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \ >>> --with-jdk-headers=/usr/local/jdk1.8.0_66/include \ >>> JAVA_HOME=/usr/local/jdk1.8.0_66 \ >>> LDFLAGS="-m64" CC="cc" CXX="CC" FC="f95" \ >>> CFLAGS="-m64 -z noexecstack" CXXFLAGS="-m64 -library=stlport4" >>> FCFLAGS="-m64" \ >>> CPP="cpp" CXXCPP="cpp" \ >>> --enable-mpi-cxx \ >>> --enable-cxx-exceptions \ >>> --enable-mpi-java \ >>> --enable-heterogeneous \ >>> --enable-mpi-thread-multiple \ >>> --with-hwloc=internal \ >>> --without-verbs \ >>> --with-wrapper-cflags="-m64" \ >>> --with-wrapper-cxxflags="-m64 -library=stlport4" \ >>> --with-wrapper-fcflags="-m64" \ >>> --with-wrapper-ldflags="" \ >>> --enable-debug \ >>> |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc >>> >>> make |& tee log.make.$SYSTEM_ENV.$MACHINE_ENV.64_cc >>> >>> >>> >>> loki hello_1 120 ompi_info | egrep -e "Open MPI repo revision:" -e "C >>> compiler absolute:" >>> Open MPI repo revision: v2.x-dev-958-g7e94425 >>> C compiler absolute: /opt/solstudio12.4/bin/cc >>> >>> >>> loki hello_1 120 mpiexec -np 3 --host loki --slot-list 0:0-5,1:0-5 >>> hello_1_mpi >>> mpiexec: symbol lookup error: >>> /usr/local/openmpi-2.0.0_64_cc/lib64/libpmix.so.2: undefined symbol: >>> __builtin_clz >>> loki hello_1 121 >>> >>> >>> >>> I get the following error spawning a process and a different one >>> spawning multiple processes. >>> >>> >>> loki spawn 137 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 >>> spawn_master >>> >>> Parent process 0 running on loki >>> I create 4 slave processes >>> >>> [loki:24531] [[49263,0],0] ORTE_ERROR_LOG: Not found in file >>> ../../openmpi-v2.x-dev-958-g7e94425/orte/orted/pmix/pmix_server_fence.c at >>> line 186 >>> [loki:24531] [[49263,0],0] ORTE_ERROR_LOG: Not found in file >>> ../../openmpi-v2.x-dev-958-g7e94425/orte/orted/pmix/pmix_server_fence.c at >>> line 186 >>> [loki:24531] [[49263,0],0] ORTE_ERROR_LOG: Not found in file >>> ../../openmpi-v2.x-dev-958-g7e94425/orte/orted/pmix/pmix_server_fence.c at >>> line 186 >>> -------------------------------------------------------------------------- >>> It looks like MPI_INIT failed for some reason; your parallel process is >>> likely to abort. There are many reasons that a parallel process can >>> fail during MPI_INIT; some of which are due to configuration or environment >>> problems. This failure appears to be an internal failure; here's some >>> additional information (which may only be relevant to an Open MPI >>> developer): >>> >>> ompi_proc_complete_init failed >>> --> Returned "Not found" (-13) instead of "Success" (0) >>> -------------------------------------------------------------------------- >>> *** An error occurred in MPI_Init >>> *** on a NULL communicator >>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, >>> *** and potentially your MPI job) >>> -------------------------------------------------------------------------- >>> ... >>> >>> >>> >>> >>> loki spawn 138 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 >>> spawn_multiple_master >>> >>> Parent process 0 running on loki >>> I create 3 slave processes. >>> >>> [loki:24717] PMIX ERROR: UNPACK-PAST-END in file >>> ../../../../../../openmpi-v2.x-dev-958-g7e94425/opal/mca/pmix/pmix112/pmix/src/server/pmix_server_ops.c >>> at line 829 >>> [loki:24717] PMIX ERROR: UNPACK-PAST-END in file >>> ../../../../../../openmpi-v2.x-dev-958-g7e94425/opal/mca/pmix/pmix112/pmix/src/server/pmix_server.c >>> at line 2176 >>> [loki:24721] *** An error occurred in MPI_Comm_spawn_multiple >>> [loki:24721] *** reported by process [4281401345,0] >>> [loki:24721] *** on communicator MPI_COMM_WORLD >>> [loki:24721] *** MPI_ERR_SPAWN: could not spawn processes >>> [loki:24721] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will >>> now abort, >>> [loki:24721] *** and potentially your MPI job) >>> loki spawn 139 >>> >>> >>> >>> Everything works as expected for the following program. >>> >>> loki spawn 139 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 >>> spawn_intra_comm >>> Parent process 0: I create 2 slave processes >>> >>> Parent process 0 running on loki >>> MPI_COMM_WORLD ntasks: 1 >>> COMM_CHILD_PROCESSES ntasks_local: 1 >>> COMM_CHILD_PROCESSES ntasks_remote: 1 >>> COMM_ALL_PROCESSES ntasks: 2 >>> mytid in COMM_ALL_PROCESSES: 0 >>> >>> Child process 0 running on loki >>> MPI_COMM_WORLD ntasks: 1 >>> COMM_ALL_PROCESSES ntasks: 2 >>> mytid in COMM_ALL_PROCESSES: 1 >>> loki spawn 140 >>> >>> >>> >>> I would be grateful if somebody can fix the problem. Please let me >>> know if you need anything else. Thank you very much for any help in >>> advance. >>> >>> >>> Best regards >>> >>> Siegmar >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/01/28273.php >>> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/01/28283.php > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/02/28559.php