Hi,
today I installed openmpi-dev-602-g82c02b4 on my machines (Solaris
10 Sparc,
Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with gcc-4.9.2
and the
new Solaris Studio 12.4 compilers. All build processes finished
without
errors, but I have a problem running a very small program. It works
for
three processes but hangs for six processes. I have the same
behaviour
for both compilers.
tyr small_prog 139 time; mpiexec -np 3 --host sunpc1,linpc1,tyr
init_finalize; time
827.161u 210.126s 30:51.08 56.0% 0+0k 4151+20io 2898pf+0w
Hello!
Hello!
Hello!
827.886u 210.335s 30:54.68 55.9% 0+0k 4151+20io 2898pf+0w
tyr small_prog 140 time; mpiexec -np 6 --host sunpc1,linpc1,tyr
init_finalize; time
827.946u 210.370s 31:15.02 55.3% 0+0k 4151+20io 2898pf+0w
^CKilled by signal 2.
Killed by signal 2.
869.242u 221.644s 33:40.54 53.9% 0+0k 4151+20io 2898pf+0w
tyr small_prog 141
tyr small_prog 145 ompi_info | grep -e "Open MPI repo revision:" -e
"C compiler:"
Open MPI repo revision: dev-602-g82c02b4
C compiler: cc
tyr small_prog 146
tyr small_prog 146 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec
GNU gdb (GDB) 7.6.1
...
(gdb) run -np 3 --host sunpc1,linpc1,tyr init_finalize
Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 3
--host sunpc1,linpc1,tyr
init_finalize
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP 2 ]
Hello!
Hello!
Hello!
[LWP 2 exited]
[New Thread 2 ]
[Switching to Thread 1 (LWP 1)]
sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found
to satisfy query
(gdb) run -np 6 --host sunpc1,linpc1,tyr init_finalize
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 6
--host sunpc1,linpc1,tyr
init_finalize
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP 2 ]
^CKilled by signal 2.
Killed by signal 2.
Program received signal SIGINT, Interrupt.
[Switching to Thread 1 (LWP 1)]
0xffffffff7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1
(gdb) bt
#0 0xffffffff7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1
#1 0xffffffff7d1cb468 in _pollsys () from /lib/sparcv9/libc.so.1
#2 0xffffffff7d170ed8 in poll () from /lib/sparcv9/libc.so.1
#3 0xffffffff7e69a630 in poll_dispatch ()
from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0
#4 0xffffffff7e6894ec in opal_libevent2021_event_base_loop ()
from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0
#5 0x000000010000eb14 in orterun (argc=1757447168,
argv=0xffffff7ed8550cff)
at
../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/orterun.c:1090
#6 0x0000000100004e2c in main (argc=256, argv=0xffffff7ed8af5c00)
at
../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/main.c:13
(gdb)
Any ideas? Unfortunately I'm leaving for vaccation so that I cannot
test
any patches until the end of the year. Neverthess I wanted to report
the
problem. At the moment I cannot test if I have the same behaviour in
a
homogeneous environment with three machines because the new version
isn't
available before tomorrow on the other machines. I used the
following
configure command.
../openmpi-dev-602-g82c02b4/configure
--prefix=/usr/local/openmpi-1.9.0_64_cc \
--libdir=/usr/local/openmpi-1.9.0_64_cc/lib64 \
--with-jdk-bindir=/usr/local/jdk1.8.0/bin \
--with-jdk-headers=/usr/local/jdk1.8.0/include \
JAVA_HOME=/usr/local/jdk1.8.0 \
LDFLAGS="-m64 -mt" \
CC="cc" CXX="CC" FC="f95" \
CFLAGS="-m64 -mt" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64"
\
CPP="cpp" CXXCPP="cpp" \
CPPFLAGS="" CXXCPPFLAGS="" \
--enable-mpi-cxx \
--enable-cxx-exceptions \
--enable-mpi-java \
--enable-heterogeneous \
--enable-mpi-thread-multiple \
--with-threads=posix \
--with-hwloc=internal \
--without-verbs \
--with-wrapper-cflags="-m64 -mt" \
--with-wrapper-cxxflags="-m64 -library=stlport4" \
--with-wrapper-ldflags="-mt" \
--enable-debug \
|& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc
Furthermore I used the following test program.
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
int main (int argc, char *argv[])
{
MPI_Init (&argc, &argv);
printf ("Hello!\n");
MPI_Finalize ();
return EXIT_SUCCESS;
}
Kind regards
Siegmar
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/12/26052.php