Siegmar,

the fix is now being discussed at https://github.com/open-mpi/ompi/pull/1285

the other error your reported (MPI_Comm_spawn hanging on an heterogeneous cluster) is
being discussed at https://github.com/open-mpi/ompi/pull/1292


Cheers,

Gilles

On 1/14/2016 11:06 PM, Siegmar Gross wrote:
Hi,

I've successfully built openmpi-v2.x-dev-958-g7e94425 on my machine
(SUSE Linux Enterprise Server 12.0 x86_64) with gcc-5.2.0 and
Sun C 5.13. Unfortunately I get a runtime error for all programs
if I use the Sun compiler. Most of my small works es expected with
the GNU compiler. I used the following command to build the package
for cc.


mkdir openmpi-v2.x-dev-958-g7e94425-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc
cd openmpi-v2.x-dev-958-g7e94425-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc

../openmpi-v2.x-dev-958-g7e94425/configure \
  --prefix=/usr/local/openmpi-2.0.0_64_cc \
  --libdir=/usr/local/openmpi-2.0.0_64_cc/lib64 \
  --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \
  --with-jdk-headers=/usr/local/jdk1.8.0_66/include \
  JAVA_HOME=/usr/local/jdk1.8.0_66 \
  LDFLAGS="-m64" CC="cc" CXX="CC" FC="f95" \
CFLAGS="-m64 -z noexecstack" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \
  CPP="cpp" CXXCPP="cpp" \
  --enable-mpi-cxx \
  --enable-cxx-exceptions \
  --enable-mpi-java \
  --enable-heterogeneous \
  --enable-mpi-thread-multiple \
  --with-hwloc=internal \
  --without-verbs \
  --with-wrapper-cflags="-m64" \
  --with-wrapper-cxxflags="-m64 -library=stlport4" \
  --with-wrapper-fcflags="-m64" \
  --with-wrapper-ldflags="" \
  --enable-debug \
  |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc

make |& tee log.make.$SYSTEM_ENV.$MACHINE_ENV.64_cc



loki hello_1 120 ompi_info | egrep -e "Open MPI repo revision:" -e "C compiler absolute:"
  Open MPI repo revision: v2.x-dev-958-g7e94425
     C compiler absolute: /opt/solstudio12.4/bin/cc


loki hello_1 120 mpiexec -np 3 --host loki --slot-list 0:0-5,1:0-5 hello_1_mpi mpiexec: symbol lookup error: /usr/local/openmpi-2.0.0_64_cc/lib64/libpmix.so.2: undefined symbol: __builtin_clz
loki hello_1 121



I get the following error spawning a process and a different one
spawning multiple processes.


loki spawn 137 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 spawn_master

Parent process 0 running on loki
  I create 4 slave processes

[loki:24531] [[49263,0],0] ORTE_ERROR_LOG: Not found in file ../../openmpi-v2.x-dev-958-g7e94425/orte/orted/pmix/pmix_server_fence.c at line 186 [loki:24531] [[49263,0],0] ORTE_ERROR_LOG: Not found in file ../../openmpi-v2.x-dev-958-g7e94425/orte/orted/pmix/pmix_server_fence.c at line 186 [loki:24531] [[49263,0],0] ORTE_ERROR_LOG: Not found in file ../../openmpi-v2.x-dev-958-g7e94425/orte/orted/pmix/pmix_server_fence.c at line 186 --------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_proc_complete_init failed
  --> Returned "Not found" (-13) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
--------------------------------------------------------------------------
...




loki spawn 138 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 spawn_multiple_master

Parent process 0 running on loki
  I create 3 slave processes.

[loki:24717] PMIX ERROR: UNPACK-PAST-END in file ../../../../../../openmpi-v2.x-dev-958-g7e94425/opal/mca/pmix/pmix112/pmix/src/server/pmix_server_ops.c at line 829 [loki:24717] PMIX ERROR: UNPACK-PAST-END in file ../../../../../../openmpi-v2.x-dev-958-g7e94425/opal/mca/pmix/pmix112/pmix/src/server/pmix_server.c at line 2176
[loki:24721] *** An error occurred in MPI_Comm_spawn_multiple
[loki:24721] *** reported by process [4281401345,0]
[loki:24721] *** on communicator MPI_COMM_WORLD
[loki:24721] *** MPI_ERR_SPAWN: could not spawn processes
[loki:24721] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[loki:24721] ***    and potentially your MPI job)
loki spawn 139



Everything works as expected for the following program.

loki spawn 139 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 spawn_intra_comm
Parent process 0: I create 2 slave processes

Parent process 0 running on loki
    MPI_COMM_WORLD ntasks:              1
    COMM_CHILD_PROCESSES ntasks_local:  1
    COMM_CHILD_PROCESSES ntasks_remote: 1
    COMM_ALL_PROCESSES ntasks:          2
    mytid in COMM_ALL_PROCESSES:        0

Child process 0 running on loki
    MPI_COMM_WORLD ntasks:              1
    COMM_ALL_PROCESSES ntasks:          2
    mytid in COMM_ALL_PROCESSES:        1
loki spawn 140



I would be grateful if somebody can fix the problem. Please let me
know if you need anything else. Thank you very much for any help in
advance.


Best regards

Siegmar
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: http://www.open-mpi.org/community/lists/users/2016/01/28273.php


Reply via email to