On  https://github.com/open-mpi/ompi/pull/1385 
<https://github.com/open-mpi/ompi/pull/1385> Gilles indicated he would update 
the patch and commit it on Monday


> On Feb 20, 2016, at 12:48 AM, Siegmar Gross 
> <siegmar.gr...@informatik.hs-fulda.de> wrote:
> 
> Hi Gilles,
> 
> do you know, when fixes for the problems will be ready? They still exist
> in the current version.
> 
> 
> tyr spawn 136 ompi_info | grep -e "Open MPI repo revision" -e "C compiler 
> absolute"
>  Open MPI repo revision: v2.x-dev-1108-gaaf15d9
>     C compiler absolute: /usr/local/gcc-5.1.0/bin/gcc
> 
> 
> tyr spawn 137 mpiexec -np 1 --host tyr,sunpc1 spawn_multiple_master
> 
> Parent process 0 running on tyr.informatik.hs-fulda.de
>  I create 3 slave processes.
> 
> [tyr.informatik.hs-fulda.de:23580] PMIX ERROR: UNPACK-PAST-END in file 
> ../../../../../../openmpi-v2.x-dev-1108-gaaf15d9/opal/mca/pmix/pmix112/pmix/src/server/pmix_server_ops.c
>  at line 829
> [tyr.informatik.hs-fulda.de:23580] PMIX ERROR: UNPACK-PAST-END in file 
> ../../../../../../openmpi-v2.x-dev-1108-gaaf15d9/opal/mca/pmix/pmix112/pmix/src/server/pmix_server.c
>  at line 2176
> [tyr:23587] *** An error occurred in MPI_Comm_spawn_multiple
> [tyr:23587] *** reported by process [4198105089,0]
> [tyr:23587] *** on communicator MPI_COMM_WORLD
> [tyr:23587] *** MPI_ERR_SPAWN: could not spawn processes
> [tyr:23587] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now 
> abort,
> [tyr:23587] ***    and potentially your MPI job)
> tyr spawn 138
> 
> 
> 
> 
> tyr spawn 115 ompi_info | grep -e "Open MPI repo revision" -e "C compiler 
> absolute"
>  Open MPI repo revision: v2.x-dev-1108-gaaf15d9
>     C compiler absolute: /opt/solstudio12.4/bin/cc
> 
> 
> tyr spawn 116 mpiexec -np 1 --host tyr,sunpc1 spawn_multiple_master
> [tyr.informatik.hs-fulda.de:28715] [[54797,0],0] ORTE_ERROR_LOG: Not found in 
> file 
> ../../../../../openmpi-v2.x-dev-1108-gaaf15d9/orte/mca/ess/hnp/ess_hnp_module.c
>  at line 638
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
> 
>  opal_pmix_base_select failed
>  --> Returned value Not found (-13) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> tyr spawn 117
> 
> 
> 
> Kind regards
> 
> Siegmar
> 
> 
> 
> On 01/15/16 08:03, Gilles Gouaillardet wrote:
>> Siegmar,
>> 
>> the fix is now being discussed at https://github.com/open-mpi/ompi/pull/1285
>> 
>> the other error your reported (MPI_Comm_spawn hanging on an heterogeneous 
>> cluster) is
>> being discussed at https://github.com/open-mpi/ompi/pull/1292
>> 
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> On 1/14/2016 11:06 PM, Siegmar Gross wrote:
>>> Hi,
>>> 
>>> I've successfully built openmpi-v2.x-dev-958-g7e94425 on my machine
>>> (SUSE Linux Enterprise Server 12.0 x86_64) with gcc-5.2.0 and
>>> Sun C 5.13. Unfortunately I get a runtime error for all programs
>>> if I use the Sun compiler. Most of my small works es expected with
>>> the GNU compiler. I used the following command to build the package
>>> for cc.
>>> 
>>> 
>>> mkdir openmpi-v2.x-dev-958-g7e94425-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc
>>> cd openmpi-v2.x-dev-958-g7e94425-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc
>>> 
>>> ../openmpi-v2.x-dev-958-g7e94425/configure \
>>>  --prefix=/usr/local/openmpi-2.0.0_64_cc \
>>>  --libdir=/usr/local/openmpi-2.0.0_64_cc/lib64 \
>>>  --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \
>>>  --with-jdk-headers=/usr/local/jdk1.8.0_66/include \
>>>  JAVA_HOME=/usr/local/jdk1.8.0_66 \
>>>  LDFLAGS="-m64" CC="cc" CXX="CC" FC="f95" \
>>>  CFLAGS="-m64 -z noexecstack" CXXFLAGS="-m64 -library=stlport4" 
>>> FCFLAGS="-m64" \
>>>  CPP="cpp" CXXCPP="cpp" \
>>>  --enable-mpi-cxx \
>>>  --enable-cxx-exceptions \
>>>  --enable-mpi-java \
>>>  --enable-heterogeneous \
>>>  --enable-mpi-thread-multiple \
>>>  --with-hwloc=internal \
>>>  --without-verbs \
>>>  --with-wrapper-cflags="-m64" \
>>>  --with-wrapper-cxxflags="-m64 -library=stlport4" \
>>>  --with-wrapper-fcflags="-m64" \
>>>  --with-wrapper-ldflags="" \
>>>  --enable-debug \
>>>  |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc
>>> 
>>> make |& tee log.make.$SYSTEM_ENV.$MACHINE_ENV.64_cc
>>> 
>>> 
>>> 
>>> loki hello_1 120 ompi_info | egrep -e "Open MPI repo revision:" -e "C 
>>> compiler absolute:"
>>>  Open MPI repo revision: v2.x-dev-958-g7e94425
>>>     C compiler absolute: /opt/solstudio12.4/bin/cc
>>> 
>>> 
>>> loki hello_1 120 mpiexec -np 3 --host loki --slot-list 0:0-5,1:0-5 
>>> hello_1_mpi
>>> mpiexec: symbol lookup error: 
>>> /usr/local/openmpi-2.0.0_64_cc/lib64/libpmix.so.2: undefined symbol: 
>>> __builtin_clz
>>> loki hello_1 121
>>> 
>>> 
>>> 
>>> I get the following error spawning a process and a different one
>>> spawning multiple processes.
>>> 
>>> 
>>> loki spawn 137 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 
>>> spawn_master
>>> 
>>> Parent process 0 running on loki
>>>  I create 4 slave processes
>>> 
>>> [loki:24531] [[49263,0],0] ORTE_ERROR_LOG: Not found in file
>>> ../../openmpi-v2.x-dev-958-g7e94425/orte/orted/pmix/pmix_server_fence.c at 
>>> line 186
>>> [loki:24531] [[49263,0],0] ORTE_ERROR_LOG: Not found in file
>>> ../../openmpi-v2.x-dev-958-g7e94425/orte/orted/pmix/pmix_server_fence.c at 
>>> line 186
>>> [loki:24531] [[49263,0],0] ORTE_ERROR_LOG: Not found in file
>>> ../../openmpi-v2.x-dev-958-g7e94425/orte/orted/pmix/pmix_server_fence.c at 
>>> line 186
>>> --------------------------------------------------------------------------
>>> It looks like MPI_INIT failed for some reason; your parallel process is
>>> likely to abort.  There are many reasons that a parallel process can
>>> fail during MPI_INIT; some of which are due to configuration or environment
>>> problems.  This failure appears to be an internal failure; here's some
>>> additional information (which may only be relevant to an Open MPI
>>> developer):
>>> 
>>>  ompi_proc_complete_init failed
>>>  --> Returned "Not found" (-13) instead of "Success" (0)
>>> --------------------------------------------------------------------------
>>> *** An error occurred in MPI_Init
>>> *** on a NULL communicator
>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>>> ***    and potentially your MPI job)
>>> --------------------------------------------------------------------------
>>> ...
>>> 
>>> 
>>> 
>>> 
>>> loki spawn 138 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 
>>> spawn_multiple_master
>>> 
>>> Parent process 0 running on loki
>>>  I create 3 slave processes.
>>> 
>>> [loki:24717] PMIX ERROR: UNPACK-PAST-END in file
>>> ../../../../../../openmpi-v2.x-dev-958-g7e94425/opal/mca/pmix/pmix112/pmix/src/server/pmix_server_ops.c
>>>  at line 829
>>> [loki:24717] PMIX ERROR: UNPACK-PAST-END in file
>>> ../../../../../../openmpi-v2.x-dev-958-g7e94425/opal/mca/pmix/pmix112/pmix/src/server/pmix_server.c
>>>  at line 2176
>>> [loki:24721] *** An error occurred in MPI_Comm_spawn_multiple
>>> [loki:24721] *** reported by process [4281401345,0]
>>> [loki:24721] *** on communicator MPI_COMM_WORLD
>>> [loki:24721] *** MPI_ERR_SPAWN: could not spawn processes
>>> [loki:24721] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will 
>>> now abort,
>>> [loki:24721] ***    and potentially your MPI job)
>>> loki spawn 139
>>> 
>>> 
>>> 
>>> Everything works as expected for the following program.
>>> 
>>> loki spawn 139 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 
>>> spawn_intra_comm
>>> Parent process 0: I create 2 slave processes
>>> 
>>> Parent process 0 running on loki
>>>    MPI_COMM_WORLD ntasks:              1
>>>    COMM_CHILD_PROCESSES ntasks_local:  1
>>>    COMM_CHILD_PROCESSES ntasks_remote: 1
>>>    COMM_ALL_PROCESSES ntasks:          2
>>>    mytid in COMM_ALL_PROCESSES:        0
>>> 
>>> Child process 0 running on loki
>>>    MPI_COMM_WORLD ntasks:              1
>>>    COMM_ALL_PROCESSES ntasks:          2
>>>    mytid in COMM_ALL_PROCESSES:        1
>>> loki spawn 140
>>> 
>>> 
>>> 
>>> I would be grateful if somebody can fix the problem. Please let me
>>> know if you need anything else. Thank you very much for any help in
>>> advance.
>>> 
>>> 
>>> Best regards
>>> 
>>> Siegmar
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2016/01/28273.php
>>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2016/01/28283.php
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/02/28559.php

Reply via email to