Hi,

today I installed openmpi-dev-602-g82c02b4 on my machines (Solaris 10 Sparc,
Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with gcc-4.9.2 and the
new Solaris Studio 12.4 compilers. All build processes finished without
errors, but I have a problem running a very small program. It works for
three processes but hangs for six processes. I have the same behaviour
for both compilers.

tyr small_prog 139 time; mpiexec -np 3 --host sunpc1,linpc1,tyr init_finalize; 
time
827.161u 210.126s 30:51.08 56.0%        0+0k 4151+20io 2898pf+0w
Hello!
Hello!
Hello!
827.886u 210.335s 30:54.68 55.9%        0+0k 4151+20io 2898pf+0w
tyr small_prog 140 time; mpiexec -np 6 --host sunpc1,linpc1,tyr init_finalize; 
time
827.946u 210.370s 31:15.02 55.3%        0+0k 4151+20io 2898pf+0w
^CKilled by signal 2.
Killed by signal 2.
869.242u 221.644s 33:40.54 53.9%        0+0k 4151+20io 2898pf+0w
tyr small_prog 141 

tyr small_prog 145 ompi_info | grep -e "Open MPI repo revision:" -e "C 
compiler:"
  Open MPI repo revision: dev-602-g82c02b4
              C compiler: cc
tyr small_prog 146 


tyr small_prog 146 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec
GNU gdb (GDB) 7.6.1
...
(gdb) run -np 3 --host sunpc1,linpc1,tyr init_finalize
Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 3 --host 
sunpc1,linpc1,tyr 
init_finalize
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP    2        ]
Hello!
Hello!
Hello!
[LWP    2         exited]
[New Thread 2        ]
[Switching to Thread 1 (LWP 1)]
sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to satisfy 
query
(gdb) run -np 6 --host sunpc1,linpc1,tyr init_finalize
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 6 --host 
sunpc1,linpc1,tyr 
init_finalize
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP    2        ]
^CKilled by signal 2.
Killed by signal 2.

Program received signal SIGINT, Interrupt.
[Switching to Thread 1 (LWP 1)]
0xffffffff7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1
(gdb) bt
#0  0xffffffff7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1
#1  0xffffffff7d1cb468 in _pollsys () from /lib/sparcv9/libc.so.1
#2  0xffffffff7d170ed8 in poll () from /lib/sparcv9/libc.so.1
#3  0xffffffff7e69a630 in poll_dispatch ()
   from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0
#4  0xffffffff7e6894ec in opal_libevent2021_event_base_loop ()
   from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0
#5  0x000000010000eb14 in orterun (argc=1757447168, argv=0xffffff7ed8550cff)
    at ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/orterun.c:1090
#6  0x0000000100004e2c in main (argc=256, argv=0xffffff7ed8af5c00)
    at ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/main.c:13
(gdb) 

Any ideas? Unfortunately I'm leaving for vaccation so that I cannot test
any patches until the end of the year. Neverthess I wanted to report the
problem. At the moment I cannot test if I have the same behaviour in a
homogeneous environment with three machines because the new version isn't
available before tomorrow on the other machines. I used the following
configure command.

../openmpi-dev-602-g82c02b4/configure --prefix=/usr/local/openmpi-1.9.0_64_cc \
  --libdir=/usr/local/openmpi-1.9.0_64_cc/lib64 \
  --with-jdk-bindir=/usr/local/jdk1.8.0/bin \
  --with-jdk-headers=/usr/local/jdk1.8.0/include \
  JAVA_HOME=/usr/local/jdk1.8.0 \
  LDFLAGS="-m64 -mt" \
  CC="cc" CXX="CC" FC="f95" \
  CFLAGS="-m64 -mt" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \
  CPP="cpp" CXXCPP="cpp" \
  CPPFLAGS="" CXXCPPFLAGS="" \
  --enable-mpi-cxx \
  --enable-cxx-exceptions \
  --enable-mpi-java \
  --enable-heterogeneous \
  --enable-mpi-thread-multiple \
  --with-threads=posix \
  --with-hwloc=internal \
  --without-verbs \
  --with-wrapper-cflags="-m64 -mt" \
  --with-wrapper-cxxflags="-m64 -library=stlport4" \
  --with-wrapper-ldflags="-mt" \
  --enable-debug \
  |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc

Furthermore I used the following test program.

#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"

int main (int argc, char *argv[])
{
  MPI_Init (&argc, &argv);
  printf ("Hello!\n");
  MPI_Finalize ();
  return EXIT_SUCCESS;
}



Kind regards

Siegmar

Reply via email to