Hi, today I installed openmpi-dev-602-g82c02b4 on my machines (Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with gcc-4.9.2 and the new Solaris Studio 12.4 compilers. All build processes finished without errors, but I have a problem running a very small program. It works for three processes but hangs for six processes. I have the same behaviour for both compilers.
tyr small_prog 139 time; mpiexec -np 3 --host sunpc1,linpc1,tyr init_finalize; time 827.161u 210.126s 30:51.08 56.0% 0+0k 4151+20io 2898pf+0w Hello! Hello! Hello! 827.886u 210.335s 30:54.68 55.9% 0+0k 4151+20io 2898pf+0w tyr small_prog 140 time; mpiexec -np 6 --host sunpc1,linpc1,tyr init_finalize; time 827.946u 210.370s 31:15.02 55.3% 0+0k 4151+20io 2898pf+0w ^CKilled by signal 2. Killed by signal 2. 869.242u 221.644s 33:40.54 53.9% 0+0k 4151+20io 2898pf+0w tyr small_prog 141 tyr small_prog 145 ompi_info | grep -e "Open MPI repo revision:" -e "C compiler:" Open MPI repo revision: dev-602-g82c02b4 C compiler: cc tyr small_prog 146 tyr small_prog 146 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec GNU gdb (GDB) 7.6.1 ... (gdb) run -np 3 --host sunpc1,linpc1,tyr init_finalize Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 3 --host sunpc1,linpc1,tyr init_finalize [Thread debugging using libthread_db enabled] [New Thread 1 (LWP 1)] [New LWP 2 ] Hello! Hello! Hello! [LWP 2 exited] [New Thread 2 ] [Switching to Thread 1 (LWP 1)] sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to satisfy query (gdb) run -np 6 --host sunpc1,linpc1,tyr init_finalize The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 6 --host sunpc1,linpc1,tyr init_finalize [Thread debugging using libthread_db enabled] [New Thread 1 (LWP 1)] [New LWP 2 ] ^CKilled by signal 2. Killed by signal 2. Program received signal SIGINT, Interrupt. [Switching to Thread 1 (LWP 1)] 0xffffffff7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1 (gdb) bt #0 0xffffffff7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1 #1 0xffffffff7d1cb468 in _pollsys () from /lib/sparcv9/libc.so.1 #2 0xffffffff7d170ed8 in poll () from /lib/sparcv9/libc.so.1 #3 0xffffffff7e69a630 in poll_dispatch () from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0 #4 0xffffffff7e6894ec in opal_libevent2021_event_base_loop () from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0 #5 0x000000010000eb14 in orterun (argc=1757447168, argv=0xffffff7ed8550cff) at ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/orterun.c:1090 #6 0x0000000100004e2c in main (argc=256, argv=0xffffff7ed8af5c00) at ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/main.c:13 (gdb) Any ideas? Unfortunately I'm leaving for vaccation so that I cannot test any patches until the end of the year. Neverthess I wanted to report the problem. At the moment I cannot test if I have the same behaviour in a homogeneous environment with three machines because the new version isn't available before tomorrow on the other machines. I used the following configure command. ../openmpi-dev-602-g82c02b4/configure --prefix=/usr/local/openmpi-1.9.0_64_cc \ --libdir=/usr/local/openmpi-1.9.0_64_cc/lib64 \ --with-jdk-bindir=/usr/local/jdk1.8.0/bin \ --with-jdk-headers=/usr/local/jdk1.8.0/include \ JAVA_HOME=/usr/local/jdk1.8.0 \ LDFLAGS="-m64 -mt" \ CC="cc" CXX="CC" FC="f95" \ CFLAGS="-m64 -mt" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \ CPP="cpp" CXXCPP="cpp" \ CPPFLAGS="" CXXCPPFLAGS="" \ --enable-mpi-cxx \ --enable-cxx-exceptions \ --enable-mpi-java \ --enable-heterogeneous \ --enable-mpi-thread-multiple \ --with-threads=posix \ --with-hwloc=internal \ --without-verbs \ --with-wrapper-cflags="-m64 -mt" \ --with-wrapper-cxxflags="-m64 -library=stlport4" \ --with-wrapper-ldflags="-mt" \ --enable-debug \ |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc Furthermore I used the following test program. #include <stdio.h> #include <stdlib.h> #include "mpi.h" int main (int argc, char *argv[]) { MPI_Init (&argc, &argv); printf ("Hello!\n"); MPI_Finalize (); return EXIT_SUCCESS; } Kind regards Siegmar