Re: [OMPI users] processes hang with openmpi-dev-602-g82c02b4
Siegmar, could you please give a try to the attached patch ? /* and keep in mind this is just a workaround that happen to work */ Cheers, Gilles On 2014/12/22 22:48, Siegmar Gross wrote: > Hi, > > today I installed openmpi-dev-602-g82c02b4 on my machines (Solaris 10 Sparc, > Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with gcc-4.9.2 and the > new Solaris Studio 12.4 compilers. All build processes finished without > errors, but I have a problem running a very small program. It works for > three processes but hangs for six processes. I have the same behaviour > for both compilers. > > tyr small_prog 139 time; mpiexec -np 3 --host sunpc1,linpc1,tyr > init_finalize; time > 827.161u 210.126s 30:51.08 56.0%0+0k 4151+20io 2898pf+0w > Hello! > Hello! > Hello! > 827.886u 210.335s 30:54.68 55.9%0+0k 4151+20io 2898pf+0w > tyr small_prog 140 time; mpiexec -np 6 --host sunpc1,linpc1,tyr > init_finalize; time > 827.946u 210.370s 31:15.02 55.3%0+0k 4151+20io 2898pf+0w > ^CKilled by signal 2. > Killed by signal 2. > 869.242u 221.644s 33:40.54 53.9%0+0k 4151+20io 2898pf+0w > tyr small_prog 141 > > tyr small_prog 145 ompi_info | grep -e "Open MPI repo revision:" -e "C > compiler:" > Open MPI repo revision: dev-602-g82c02b4 > C compiler: cc > tyr small_prog 146 > > > tyr small_prog 146 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec > GNU gdb (GDB) 7.6.1 > ... > (gdb) run -np 3 --host sunpc1,linpc1,tyr init_finalize > Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 3 --host > sunpc1,linpc1,tyr > init_finalize > [Thread debugging using libthread_db enabled] > [New Thread 1 (LWP 1)] > [New LWP2] > Hello! > Hello! > Hello! > [LWP2 exited] > [New Thread 2] > [Switching to Thread 1 (LWP 1)] > sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to > satisfy query > (gdb) run -np 6 --host sunpc1,linpc1,tyr init_finalize > The program being debugged has been started already. > Start it from the beginning? (y or n) y > > Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 6 --host > sunpc1,linpc1,tyr > init_finalize > [Thread debugging using libthread_db enabled] > [New Thread 1 (LWP 1)] > [New LWP2] > ^CKilled by signal 2. > Killed by signal 2. > > Program received signal SIGINT, Interrupt. > [Switching to Thread 1 (LWP 1)] > 0x7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1 > (gdb) bt > #0 0x7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1 > #1 0x7d1cb468 in _pollsys () from /lib/sparcv9/libc.so.1 > #2 0x7d170ed8 in poll () from /lib/sparcv9/libc.so.1 > #3 0x7e69a630 in poll_dispatch () >from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0 > #4 0x7e6894ec in opal_libevent2021_event_base_loop () >from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0 > #5 0x0001eb14 in orterun (argc=1757447168, argv=0xff7ed8550cff) > at ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/orterun.c:1090 > #6 0x00014e2c in main (argc=256, argv=0xff7ed8af5c00) > at ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/main.c:13 > (gdb) > > Any ideas? Unfortunately I'm leaving for vaccation so that I cannot test > any patches until the end of the year. Neverthess I wanted to report the > problem. At the moment I cannot test if I have the same behaviour in a > homogeneous environment with three machines because the new version isn't > available before tomorrow on the other machines. I used the following > configure command. > > ../openmpi-dev-602-g82c02b4/configure --prefix=/usr/local/openmpi-1.9.0_64_cc > \ > --libdir=/usr/local/openmpi-1.9.0_64_cc/lib64 \ > --with-jdk-bindir=/usr/local/jdk1.8.0/bin \ > --with-jdk-headers=/usr/local/jdk1.8.0/include \ > JAVA_HOME=/usr/local/jdk1.8.0 \ > LDFLAGS="-m64 -mt" \ > CC="cc" CXX="CC" FC="f95" \ > CFLAGS="-m64 -mt" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \ > CPP="cpp" CXXCPP="cpp" \ > CPPFLAGS="" CXXCPPFLAGS="" \ > --enable-mpi-cxx \ > --enable-cxx-exceptions \ > --enable-mpi-java \ > --enable-heterogeneous \ > --enable-mpi-thread-multiple \ > --with-threads=posix \ > --with-hwloc=internal \ > --without-verbs \ > --with-wrapper-cflags="-m64 -mt" \ > --with-wrapper-cxxflags="-m64 -library=stlport4" \ > --with-wrapper-ldflags="-mt" \ > --enable-debug \ > |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc > > Furthermore I used the following test program. > > #include > #include > #include "mpi.h" > > int main (int argc, char *argv[]) > { > MPI_Init (&argc, &argv); > printf ("Hello!\n"); > MPI_Finalize (); > return EXIT_SUCCESS; > } > > > > Kind regards > > Siegmar > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community
Re: [OMPI users] processes hang with openmpi-dev-602-g82c02b4
Kawashima-san, i'd rather consider this as a bug in the README (!) heterogenous support has been broken for some time, but it was eventually fixed. truth is there are *very* limited resources (both human and hardware) maintaining heterogeneous support, but that does not mean heterogeneous support should not be used, nor that bug report will be ignored. Cheers, Gilles On 2014/12/24 9:26, Kawashima, Takahiro wrote: > Hi Siegmar, > > Heterogeneous environment is not supported officially. > > README of Open MPI master says: > > --enable-heterogeneous > Enable support for running on heterogeneous clusters (e.g., machines > with different endian representations). Heterogeneous support is > disabled by default because it imposes a minor performance penalty. > > *** THIS FUNCTIONALITY IS CURRENTLY BROKEN - DO NOT USE *** > >> Hi, >> >> today I installed openmpi-dev-602-g82c02b4 on my machines (Solaris 10 Sparc, >> Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with gcc-4.9.2 and the >> new Solaris Studio 12.4 compilers. All build processes finished without >> errors, but I have a problem running a very small program. It works for >> three processes but hangs for six processes. I have the same behaviour >> for both compilers. >> >> tyr small_prog 139 time; mpiexec -np 3 --host sunpc1,linpc1,tyr >> init_finalize; time >> 827.161u 210.126s 30:51.08 56.0%0+0k 4151+20io 2898pf+0w >> Hello! >> Hello! >> Hello! >> 827.886u 210.335s 30:54.68 55.9%0+0k 4151+20io 2898pf+0w >> tyr small_prog 140 time; mpiexec -np 6 --host sunpc1,linpc1,tyr >> init_finalize; time >> 827.946u 210.370s 31:15.02 55.3%0+0k 4151+20io 2898pf+0w >> ^CKilled by signal 2. >> Killed by signal 2. >> 869.242u 221.644s 33:40.54 53.9%0+0k 4151+20io 2898pf+0w >> tyr small_prog 141 >> >> tyr small_prog 145 ompi_info | grep -e "Open MPI repo revision:" -e "C >> compiler:" >> Open MPI repo revision: dev-602-g82c02b4 >> C compiler: cc >> tyr small_prog 146 >> >> >> tyr small_prog 146 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec >> GNU gdb (GDB) 7.6.1 >> ... >> (gdb) run -np 3 --host sunpc1,linpc1,tyr init_finalize >> Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 3 --host >> sunpc1,linpc1,tyr >> init_finalize >> [Thread debugging using libthread_db enabled] >> [New Thread 1 (LWP 1)] >> [New LWP2] >> Hello! >> Hello! >> Hello! >> [LWP2 exited] >> [New Thread 2] >> [Switching to Thread 1 (LWP 1)] >> sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to >> satisfy query >> (gdb) run -np 6 --host sunpc1,linpc1,tyr init_finalize >> The program being debugged has been started already. >> Start it from the beginning? (y or n) y >> >> Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 6 --host >> sunpc1,linpc1,tyr >> init_finalize >> [Thread debugging using libthread_db enabled] >> [New Thread 1 (LWP 1)] >> [New LWP2] >> ^CKilled by signal 2. >> Killed by signal 2. >> >> Program received signal SIGINT, Interrupt. >> [Switching to Thread 1 (LWP 1)] >> 0x7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1 >> (gdb) bt >> #0 0x7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1 >> #1 0x7d1cb468 in _pollsys () from /lib/sparcv9/libc.so.1 >> #2 0x7d170ed8 in poll () from /lib/sparcv9/libc.so.1 >> #3 0x7e69a630 in poll_dispatch () >>from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0 >> #4 0x7e6894ec in opal_libevent2021_event_base_loop () >>from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0 >> #5 0x0001eb14 in orterun (argc=1757447168, argv=0xff7ed8550cff) >> at ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/orterun.c:1090 >> #6 0x00014e2c in main (argc=256, argv=0xff7ed8af5c00) >> at ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/main.c:13 >> (gdb) >> >> Any ideas? Unfortunately I'm leaving for vaccation so that I cannot test >> any patches until the end of the year. Neverthess I wanted to report the >> problem. At the moment I cannot test if I have the same behaviour in a >> homogeneous environment with three machines because the new version isn't >> available before tomorrow on the other machines. I used the following >> configure command. >> >> ../openmpi-dev-602-g82c02b4/configure >> --prefix=/usr/local/openmpi-1.9.0_64_cc \ >> --libdir=/usr/local/openmpi-1.9.0_64_cc/lib64 \ >> --with-jdk-bindir=/usr/local/jdk1.8.0/bin \ >> --with-jdk-headers=/usr/local/jdk1.8.0/include \ >> JAVA_HOME=/usr/local/jdk1.8.0 \ >> LDFLAGS="-m64 -mt" \ >> CC="cc" CXX="CC" FC="f95" \ >> CFLAGS="-m64 -mt" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \ >> CPP="cpp" CXXCPP="cpp" \ >> CPPFLAGS="" CXXCPPFLAGS="" \ >> --enable-mpi-cxx \ >> --enable-cxx-exceptions \ >> --enable-mpi-java \ >> --enable-heterogeneous \ >> --enable-mpi-thread-multiple \ >> --with-threads=po
Re: [OMPI users] processes hang with openmpi-dev-602-g82c02b4
Gilles, Ahh, I didn't know the current status. Thank you for the notice! Thanks, Takahiro Kawashima > Kawashima-san, > > i'd rather consider this as a bug in the README (!) > > > heterogenous support has been broken for some time, but it was > eventually fixed. > > truth is there are *very* limited resources (both human and hardware) > maintaining heterogeneous > support, but that does not mean heterogeneous support should not be > used, nor that bug report > will be ignored. > > Cheers, > > Gilles > > On 2014/12/24 9:26, Kawashima, Takahiro wrote: > > Hi Siegmar, > > > > Heterogeneous environment is not supported officially. > > > > README of Open MPI master says: > > > > --enable-heterogeneous > > Enable support for running on heterogeneous clusters (e.g., machines > > with different endian representations). Heterogeneous support is > > disabled by default because it imposes a minor performance penalty. > > > > *** THIS FUNCTIONALITY IS CURRENTLY BROKEN - DO NOT USE *** > > > >> Hi, > >> > >> today I installed openmpi-dev-602-g82c02b4 on my machines (Solaris 10 > >> Sparc, > >> Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with gcc-4.9.2 and the > >> new Solaris Studio 12.4 compilers. All build processes finished without > >> errors, but I have a problem running a very small program. It works for > >> three processes but hangs for six processes. I have the same behaviour > >> for both compilers. > >> > >> tyr small_prog 139 time; mpiexec -np 3 --host sunpc1,linpc1,tyr > >> init_finalize; time > >> 827.161u 210.126s 30:51.08 56.0%0+0k 4151+20io 2898pf+0w > >> Hello! > >> Hello! > >> Hello! > >> 827.886u 210.335s 30:54.68 55.9%0+0k 4151+20io 2898pf+0w > >> tyr small_prog 140 time; mpiexec -np 6 --host sunpc1,linpc1,tyr > >> init_finalize; time > >> 827.946u 210.370s 31:15.02 55.3%0+0k 4151+20io 2898pf+0w > >> ^CKilled by signal 2. > >> Killed by signal 2. > >> 869.242u 221.644s 33:40.54 53.9%0+0k 4151+20io 2898pf+0w > >> tyr small_prog 141 > >> > >> tyr small_prog 145 ompi_info | grep -e "Open MPI repo revision:" -e "C > >> compiler:" > >> Open MPI repo revision: dev-602-g82c02b4 > >> C compiler: cc > >> tyr small_prog 146 > >> > >> > >> tyr small_prog 146 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec > >> GNU gdb (GDB) 7.6.1 > >> ... > >> (gdb) run -np 3 --host sunpc1,linpc1,tyr init_finalize > >> Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 3 --host > >> sunpc1,linpc1,tyr > >> init_finalize > >> [Thread debugging using libthread_db enabled] > >> [New Thread 1 (LWP 1)] > >> [New LWP2] > >> Hello! > >> Hello! > >> Hello! > >> [LWP2 exited] > >> [New Thread 2] > >> [Switching to Thread 1 (LWP 1)] > >> sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to > >> satisfy query > >> (gdb) run -np 6 --host sunpc1,linpc1,tyr init_finalize > >> The program being debugged has been started already. > >> Start it from the beginning? (y or n) y > >> > >> Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 6 --host > >> sunpc1,linpc1,tyr > >> init_finalize > >> [Thread debugging using libthread_db enabled] > >> [New Thread 1 (LWP 1)] > >> [New LWP2] > >> ^CKilled by signal 2. > >> Killed by signal 2. > >> > >> Program received signal SIGINT, Interrupt. > >> [Switching to Thread 1 (LWP 1)] > >> 0x7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1 > >> (gdb) bt > >> #0 0x7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1 > >> #1 0x7d1cb468 in _pollsys () from /lib/sparcv9/libc.so.1 > >> #2 0x7d170ed8 in poll () from /lib/sparcv9/libc.so.1 > >> #3 0x7e69a630 in poll_dispatch () > >>from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0 > >> #4 0x7e6894ec in opal_libevent2021_event_base_loop () > >>from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0 > >> #5 0x0001eb14 in orterun (argc=1757447168, > >> argv=0xff7ed8550cff) > >> at > >> ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/orterun.c:1090 > >> #6 0x00014e2c in main (argc=256, argv=0xff7ed8af5c00) > >> at ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/main.c:13 > >> (gdb) > >> > >> Any ideas? Unfortunately I'm leaving for vaccation so that I cannot test > >> any patches until the end of the year. Neverthess I wanted to report the > >> problem. At the moment I cannot test if I have the same behaviour in a > >> homogeneous environment with three machines because the new version isn't > >> available before tomorrow on the other machines. I used the following > >> configure command. > >> > >> ../openmpi-dev-602-g82c02b4/configure > >> --prefix=/usr/local/openmpi-1.9.0_64_cc \ > >> --libdir=/usr/local/openmpi-1.9.0_64_cc/lib64 \ > >> --with-jdk-bindir=/usr/local/jdk1.8.0/bin \ > >> --with-jdk-headers=/usr/local/jdk1.8.0/include \ > >> JAVA_HOME=/usr/local/jdk1.8.
Re: [OMPI users] processes hang with openmpi-dev-602-g82c02b4
Hi Gilles, Am 2014-12-24 08:09, schrieb Gilles Gouaillardet: Siegmar, could you please give a try to the attached patch ? /* and keep in mind this is just a workaround that happen to work */ At the moment I can only read and answer email with my iPad. I will try your patch next year when I'm back in my office. Thank you very much for your help, Merry Christmas, and a Happy New Year Siegmar Cheers, Gilles On 2014/12/22 22:48, Siegmar Gross wrote: Hi, today I installed openmpi-dev-602-g82c02b4 on my machines (Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with gcc-4.9.2 and the new Solaris Studio 12.4 compilers. All build processes finished without errors, but I have a problem running a very small program. It works for three processes but hangs for six processes. I have the same behaviour for both compilers. tyr small_prog 139 time; mpiexec -np 3 --host sunpc1,linpc1,tyr init_finalize; time 827.161u 210.126s 30:51.08 56.0%0+0k 4151+20io 2898pf+0w Hello! Hello! Hello! 827.886u 210.335s 30:54.68 55.9%0+0k 4151+20io 2898pf+0w tyr small_prog 140 time; mpiexec -np 6 --host sunpc1,linpc1,tyr init_finalize; time 827.946u 210.370s 31:15.02 55.3%0+0k 4151+20io 2898pf+0w ^CKilled by signal 2. Killed by signal 2. 869.242u 221.644s 33:40.54 53.9%0+0k 4151+20io 2898pf+0w tyr small_prog 141 tyr small_prog 145 ompi_info | grep -e "Open MPI repo revision:" -e "C compiler:" Open MPI repo revision: dev-602-g82c02b4 C compiler: cc tyr small_prog 146 tyr small_prog 146 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec GNU gdb (GDB) 7.6.1 ... (gdb) run -np 3 --host sunpc1,linpc1,tyr init_finalize Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 3 --host sunpc1,linpc1,tyr init_finalize [Thread debugging using libthread_db enabled] [New Thread 1 (LWP 1)] [New LWP2] Hello! Hello! Hello! [LWP2 exited] [New Thread 2] [Switching to Thread 1 (LWP 1)] sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to satisfy query (gdb) run -np 6 --host sunpc1,linpc1,tyr init_finalize The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 6 --host sunpc1,linpc1,tyr init_finalize [Thread debugging using libthread_db enabled] [New Thread 1 (LWP 1)] [New LWP2] ^CKilled by signal 2. Killed by signal 2. Program received signal SIGINT, Interrupt. [Switching to Thread 1 (LWP 1)] 0x7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1 (gdb) bt #0 0x7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1 #1 0x7d1cb468 in _pollsys () from /lib/sparcv9/libc.so.1 #2 0x7d170ed8 in poll () from /lib/sparcv9/libc.so.1 #3 0x7e69a630 in poll_dispatch () from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0 #4 0x7e6894ec in opal_libevent2021_event_base_loop () from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0 #5 0x0001eb14 in orterun (argc=1757447168, argv=0xff7ed8550cff) at ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/orterun.c:1090 #6 0x00014e2c in main (argc=256, argv=0xff7ed8af5c00) at ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/main.c:13 (gdb) Any ideas? Unfortunately I'm leaving for vaccation so that I cannot test any patches until the end of the year. Neverthess I wanted to report the problem. At the moment I cannot test if I have the same behaviour in a homogeneous environment with three machines because the new version isn't available before tomorrow on the other machines. I used the following configure command. ../openmpi-dev-602-g82c02b4/configure --prefix=/usr/local/openmpi-1.9.0_64_cc \ --libdir=/usr/local/openmpi-1.9.0_64_cc/lib64 \ --with-jdk-bindir=/usr/local/jdk1.8.0/bin \ --with-jdk-headers=/usr/local/jdk1.8.0/include \ JAVA_HOME=/usr/local/jdk1.8.0 \ LDFLAGS="-m64 -mt" \ CC="cc" CXX="CC" FC="f95" \ CFLAGS="-m64 -mt" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \ CPP="cpp" CXXCPP="cpp" \ CPPFLAGS="" CXXCPPFLAGS="" \ --enable-mpi-cxx \ --enable-cxx-exceptions \ --enable-mpi-java \ --enable-heterogeneous \ --enable-mpi-thread-multiple \ --with-threads=posix \ --with-hwloc=internal \ --without-verbs \ --with-wrapper-cflags="-m64 -mt" \ --with-wrapper-cxxflags="-m64 -library=stlport4" \ --with-wrapper-ldflags="-mt" \ --enable-debug \ |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc Furthermore I used the following test program. #include #include #include "mpi.h" int main (int argc, char *argv[]) { MPI_Init (&argc, &argv); printf ("Hello!\n"); MPI_Finalize (); return EXIT_SUCCESS; } Kind regards Siegmar ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this
Re: [OMPI users] processes hang with openmpi-dev-602-g82c02b4
I’d be a little cautious here - I’m not sure that hetero operations are completely fixed. The README is probably a bit over-stated (reflecting an earlier state), but I’m certain we haven’t extensively tested hetero operations and suspect there are still lingering issues. > On Dec 23, 2014, at 11:29 PM, Kawashima, Takahiro > wrote: > > Gilles, > > Ahh, I didn't know the current status. Thank you for the notice! > > Thanks, > Takahiro Kawashima > >> Kawashima-san, >> >> i'd rather consider this as a bug in the README (!) >> >> >> heterogenous support has been broken for some time, but it was >> eventually fixed. >> >> truth is there are *very* limited resources (both human and hardware) >> maintaining heterogeneous >> support, but that does not mean heterogeneous support should not be >> used, nor that bug report >> will be ignored. >> >> Cheers, >> >> Gilles >> >> On 2014/12/24 9:26, Kawashima, Takahiro wrote: >>> Hi Siegmar, >>> >>> Heterogeneous environment is not supported officially. >>> >>> README of Open MPI master says: >>> >>> --enable-heterogeneous >>> Enable support for running on heterogeneous clusters (e.g., machines >>> with different endian representations). Heterogeneous support is >>> disabled by default because it imposes a minor performance penalty. >>> >>> *** THIS FUNCTIONALITY IS CURRENTLY BROKEN - DO NOT USE *** >>> Hi, today I installed openmpi-dev-602-g82c02b4 on my machines (Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with gcc-4.9.2 and the new Solaris Studio 12.4 compilers. All build processes finished without errors, but I have a problem running a very small program. It works for three processes but hangs for six processes. I have the same behaviour for both compilers. tyr small_prog 139 time; mpiexec -np 3 --host sunpc1,linpc1,tyr init_finalize; time 827.161u 210.126s 30:51.08 56.0%0+0k 4151+20io 2898pf+0w Hello! Hello! Hello! 827.886u 210.335s 30:54.68 55.9%0+0k 4151+20io 2898pf+0w tyr small_prog 140 time; mpiexec -np 6 --host sunpc1,linpc1,tyr init_finalize; time 827.946u 210.370s 31:15.02 55.3%0+0k 4151+20io 2898pf+0w ^CKilled by signal 2. Killed by signal 2. 869.242u 221.644s 33:40.54 53.9%0+0k 4151+20io 2898pf+0w tyr small_prog 141 tyr small_prog 145 ompi_info | grep -e "Open MPI repo revision:" -e "C compiler:" Open MPI repo revision: dev-602-g82c02b4 C compiler: cc tyr small_prog 146 tyr small_prog 146 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec GNU gdb (GDB) 7.6.1 ... (gdb) run -np 3 --host sunpc1,linpc1,tyr init_finalize Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 3 --host sunpc1,linpc1,tyr init_finalize [Thread debugging using libthread_db enabled] [New Thread 1 (LWP 1)] [New LWP2] Hello! Hello! Hello! [LWP2 exited] [New Thread 2] [Switching to Thread 1 (LWP 1)] sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to satisfy query (gdb) run -np 6 --host sunpc1,linpc1,tyr init_finalize The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 6 --host sunpc1,linpc1,tyr init_finalize [Thread debugging using libthread_db enabled] [New Thread 1 (LWP 1)] [New LWP2] ^CKilled by signal 2. Killed by signal 2. Program received signal SIGINT, Interrupt. [Switching to Thread 1 (LWP 1)] 0x7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1 (gdb) bt #0 0x7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1 #1 0x7d1cb468 in _pollsys () from /lib/sparcv9/libc.so.1 #2 0x7d170ed8 in poll () from /lib/sparcv9/libc.so.1 #3 0x7e69a630 in poll_dispatch () from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0 #4 0x7e6894ec in opal_libevent2021_event_base_loop () from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0 #5 0x0001eb14 in orterun (argc=1757447168, argv=0xff7ed8550cff) at ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/orterun.c:1090 #6 0x00014e2c in main (argc=256, argv=0xff7ed8af5c00) at ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/main.c:13 (gdb) Any ideas? Unfortunately I'm leaving for vaccation so that I cannot test any patches until the end of the year. Neverthess I wanted to report the problem. At the moment I cannot test if I have the same behaviour in a homogeneous environment with three machines because the new version isn't availabl