Re: [OMPI users] processes hang with openmpi-dev-602-g82c02b4

2014-12-24 Thread Gilles Gouaillardet
Siegmar,

could you please give a try to the attached patch ?
/* and keep in mind this is just a workaround that happen to work */

Cheers,

Gilles

On 2014/12/22 22:48, Siegmar Gross wrote:
> Hi,
>
> today I installed openmpi-dev-602-g82c02b4 on my machines (Solaris 10 Sparc,
> Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with gcc-4.9.2 and the
> new Solaris Studio 12.4 compilers. All build processes finished without
> errors, but I have a problem running a very small program. It works for
> three processes but hangs for six processes. I have the same behaviour
> for both compilers.
>
> tyr small_prog 139 time; mpiexec -np 3 --host sunpc1,linpc1,tyr 
> init_finalize; time
> 827.161u 210.126s 30:51.08 56.0%0+0k 4151+20io 2898pf+0w
> Hello!
> Hello!
> Hello!
> 827.886u 210.335s 30:54.68 55.9%0+0k 4151+20io 2898pf+0w
> tyr small_prog 140 time; mpiexec -np 6 --host sunpc1,linpc1,tyr 
> init_finalize; time
> 827.946u 210.370s 31:15.02 55.3%0+0k 4151+20io 2898pf+0w
> ^CKilled by signal 2.
> Killed by signal 2.
> 869.242u 221.644s 33:40.54 53.9%0+0k 4151+20io 2898pf+0w
> tyr small_prog 141 
>
> tyr small_prog 145 ompi_info | grep -e "Open MPI repo revision:" -e "C 
> compiler:"
>   Open MPI repo revision: dev-602-g82c02b4
>   C compiler: cc
> tyr small_prog 146 
>
>
> tyr small_prog 146 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec
> GNU gdb (GDB) 7.6.1
> ...
> (gdb) run -np 3 --host sunpc1,linpc1,tyr init_finalize
> Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 3 --host 
> sunpc1,linpc1,tyr 
> init_finalize
> [Thread debugging using libthread_db enabled]
> [New Thread 1 (LWP 1)]
> [New LWP2]
> Hello!
> Hello!
> Hello!
> [LWP2 exited]
> [New Thread 2]
> [Switching to Thread 1 (LWP 1)]
> sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to 
> satisfy query
> (gdb) run -np 6 --host sunpc1,linpc1,tyr init_finalize
> The program being debugged has been started already.
> Start it from the beginning? (y or n) y
>
> Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 6 --host 
> sunpc1,linpc1,tyr 
> init_finalize
> [Thread debugging using libthread_db enabled]
> [New Thread 1 (LWP 1)]
> [New LWP2]
> ^CKilled by signal 2.
> Killed by signal 2.
>
> Program received signal SIGINT, Interrupt.
> [Switching to Thread 1 (LWP 1)]
> 0x7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1
> (gdb) bt
> #0  0x7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1
> #1  0x7d1cb468 in _pollsys () from /lib/sparcv9/libc.so.1
> #2  0x7d170ed8 in poll () from /lib/sparcv9/libc.so.1
> #3  0x7e69a630 in poll_dispatch ()
>from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0
> #4  0x7e6894ec in opal_libevent2021_event_base_loop ()
>from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0
> #5  0x0001eb14 in orterun (argc=1757447168, argv=0xff7ed8550cff)
> at ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/orterun.c:1090
> #6  0x00014e2c in main (argc=256, argv=0xff7ed8af5c00)
> at ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/main.c:13
> (gdb) 
>
> Any ideas? Unfortunately I'm leaving for vaccation so that I cannot test
> any patches until the end of the year. Neverthess I wanted to report the
> problem. At the moment I cannot test if I have the same behaviour in a
> homogeneous environment with three machines because the new version isn't
> available before tomorrow on the other machines. I used the following
> configure command.
>
> ../openmpi-dev-602-g82c02b4/configure --prefix=/usr/local/openmpi-1.9.0_64_cc 
> \
>   --libdir=/usr/local/openmpi-1.9.0_64_cc/lib64 \
>   --with-jdk-bindir=/usr/local/jdk1.8.0/bin \
>   --with-jdk-headers=/usr/local/jdk1.8.0/include \
>   JAVA_HOME=/usr/local/jdk1.8.0 \
>   LDFLAGS="-m64 -mt" \
>   CC="cc" CXX="CC" FC="f95" \
>   CFLAGS="-m64 -mt" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \
>   CPP="cpp" CXXCPP="cpp" \
>   CPPFLAGS="" CXXCPPFLAGS="" \
>   --enable-mpi-cxx \
>   --enable-cxx-exceptions \
>   --enable-mpi-java \
>   --enable-heterogeneous \
>   --enable-mpi-thread-multiple \
>   --with-threads=posix \
>   --with-hwloc=internal \
>   --without-verbs \
>   --with-wrapper-cflags="-m64 -mt" \
>   --with-wrapper-cxxflags="-m64 -library=stlport4" \
>   --with-wrapper-ldflags="-mt" \
>   --enable-debug \
>   |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc
>
> Furthermore I used the following test program.
>
> #include 
> #include 
> #include "mpi.h"
>
> int main (int argc, char *argv[])
> {
>   MPI_Init (&argc, &argv);
>   printf ("Hello!\n");
>   MPI_Finalize ();
>   return EXIT_SUCCESS;
> }
>
>
>
> Kind regards
>
> Siegmar
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community

Re: [OMPI users] processes hang with openmpi-dev-602-g82c02b4

2014-12-24 Thread Gilles Gouaillardet
Kawashima-san,

i'd rather consider this as a bug in the README (!)


heterogenous support has been broken for some time, but it was
eventually fixed.

truth is there are *very* limited resources (both human and hardware)
maintaining heterogeneous
support, but that does not mean heterogeneous support should not be
used, nor that bug report
will be ignored.

Cheers,

Gilles

On 2014/12/24 9:26, Kawashima, Takahiro wrote:
> Hi Siegmar,
>
> Heterogeneous environment is not supported officially.
>
> README of Open MPI master says:
>
> --enable-heterogeneous
>   Enable support for running on heterogeneous clusters (e.g., machines
>   with different endian representations).  Heterogeneous support is
>   disabled by default because it imposes a minor performance penalty.
>
>   *** THIS FUNCTIONALITY IS CURRENTLY BROKEN - DO NOT USE ***
>
>> Hi,
>>
>> today I installed openmpi-dev-602-g82c02b4 on my machines (Solaris 10 Sparc,
>> Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with gcc-4.9.2 and the
>> new Solaris Studio 12.4 compilers. All build processes finished without
>> errors, but I have a problem running a very small program. It works for
>> three processes but hangs for six processes. I have the same behaviour
>> for both compilers.
>>
>> tyr small_prog 139 time; mpiexec -np 3 --host sunpc1,linpc1,tyr 
>> init_finalize; time
>> 827.161u 210.126s 30:51.08 56.0%0+0k 4151+20io 2898pf+0w
>> Hello!
>> Hello!
>> Hello!
>> 827.886u 210.335s 30:54.68 55.9%0+0k 4151+20io 2898pf+0w
>> tyr small_prog 140 time; mpiexec -np 6 --host sunpc1,linpc1,tyr 
>> init_finalize; time
>> 827.946u 210.370s 31:15.02 55.3%0+0k 4151+20io 2898pf+0w
>> ^CKilled by signal 2.
>> Killed by signal 2.
>> 869.242u 221.644s 33:40.54 53.9%0+0k 4151+20io 2898pf+0w
>> tyr small_prog 141 
>>
>> tyr small_prog 145 ompi_info | grep -e "Open MPI repo revision:" -e "C 
>> compiler:"
>>   Open MPI repo revision: dev-602-g82c02b4
>>   C compiler: cc
>> tyr small_prog 146 
>>
>>
>> tyr small_prog 146 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec
>> GNU gdb (GDB) 7.6.1
>> ...
>> (gdb) run -np 3 --host sunpc1,linpc1,tyr init_finalize
>> Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 3 --host 
>> sunpc1,linpc1,tyr 
>> init_finalize
>> [Thread debugging using libthread_db enabled]
>> [New Thread 1 (LWP 1)]
>> [New LWP2]
>> Hello!
>> Hello!
>> Hello!
>> [LWP2 exited]
>> [New Thread 2]
>> [Switching to Thread 1 (LWP 1)]
>> sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to 
>> satisfy query
>> (gdb) run -np 6 --host sunpc1,linpc1,tyr init_finalize
>> The program being debugged has been started already.
>> Start it from the beginning? (y or n) y
>>
>> Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 6 --host 
>> sunpc1,linpc1,tyr 
>> init_finalize
>> [Thread debugging using libthread_db enabled]
>> [New Thread 1 (LWP 1)]
>> [New LWP2]
>> ^CKilled by signal 2.
>> Killed by signal 2.
>>
>> Program received signal SIGINT, Interrupt.
>> [Switching to Thread 1 (LWP 1)]
>> 0x7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1
>> (gdb) bt
>> #0  0x7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1
>> #1  0x7d1cb468 in _pollsys () from /lib/sparcv9/libc.so.1
>> #2  0x7d170ed8 in poll () from /lib/sparcv9/libc.so.1
>> #3  0x7e69a630 in poll_dispatch ()
>>from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0
>> #4  0x7e6894ec in opal_libevent2021_event_base_loop ()
>>from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0
>> #5  0x0001eb14 in orterun (argc=1757447168, argv=0xff7ed8550cff)
>> at ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/orterun.c:1090
>> #6  0x00014e2c in main (argc=256, argv=0xff7ed8af5c00)
>> at ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/main.c:13
>> (gdb) 
>>
>> Any ideas? Unfortunately I'm leaving for vaccation so that I cannot test
>> any patches until the end of the year. Neverthess I wanted to report the
>> problem. At the moment I cannot test if I have the same behaviour in a
>> homogeneous environment with three machines because the new version isn't
>> available before tomorrow on the other machines. I used the following
>> configure command.
>>
>> ../openmpi-dev-602-g82c02b4/configure 
>> --prefix=/usr/local/openmpi-1.9.0_64_cc \
>>   --libdir=/usr/local/openmpi-1.9.0_64_cc/lib64 \
>>   --with-jdk-bindir=/usr/local/jdk1.8.0/bin \
>>   --with-jdk-headers=/usr/local/jdk1.8.0/include \
>>   JAVA_HOME=/usr/local/jdk1.8.0 \
>>   LDFLAGS="-m64 -mt" \
>>   CC="cc" CXX="CC" FC="f95" \
>>   CFLAGS="-m64 -mt" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \
>>   CPP="cpp" CXXCPP="cpp" \
>>   CPPFLAGS="" CXXCPPFLAGS="" \
>>   --enable-mpi-cxx \
>>   --enable-cxx-exceptions \
>>   --enable-mpi-java \
>>   --enable-heterogeneous \
>>   --enable-mpi-thread-multiple \
>>   --with-threads=po

Re: [OMPI users] processes hang with openmpi-dev-602-g82c02b4

2014-12-24 Thread Kawashima, Takahiro
Gilles,

Ahh, I didn't know the current status. Thank you for the notice!

Thanks,
Takahiro Kawashima

> Kawashima-san,
> 
> i'd rather consider this as a bug in the README (!)
> 
> 
> heterogenous support has been broken for some time, but it was
> eventually fixed.
> 
> truth is there are *very* limited resources (both human and hardware)
> maintaining heterogeneous
> support, but that does not mean heterogeneous support should not be
> used, nor that bug report
> will be ignored.
> 
> Cheers,
> 
> Gilles
> 
> On 2014/12/24 9:26, Kawashima, Takahiro wrote:
> > Hi Siegmar,
> >
> > Heterogeneous environment is not supported officially.
> >
> > README of Open MPI master says:
> >
> > --enable-heterogeneous
> >   Enable support for running on heterogeneous clusters (e.g., machines
> >   with different endian representations).  Heterogeneous support is
> >   disabled by default because it imposes a minor performance penalty.
> >
> >   *** THIS FUNCTIONALITY IS CURRENTLY BROKEN - DO NOT USE ***
> >
> >> Hi,
> >>
> >> today I installed openmpi-dev-602-g82c02b4 on my machines (Solaris 10 
> >> Sparc,
> >> Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with gcc-4.9.2 and the
> >> new Solaris Studio 12.4 compilers. All build processes finished without
> >> errors, but I have a problem running a very small program. It works for
> >> three processes but hangs for six processes. I have the same behaviour
> >> for both compilers.
> >>
> >> tyr small_prog 139 time; mpiexec -np 3 --host sunpc1,linpc1,tyr 
> >> init_finalize; time
> >> 827.161u 210.126s 30:51.08 56.0%0+0k 4151+20io 2898pf+0w
> >> Hello!
> >> Hello!
> >> Hello!
> >> 827.886u 210.335s 30:54.68 55.9%0+0k 4151+20io 2898pf+0w
> >> tyr small_prog 140 time; mpiexec -np 6 --host sunpc1,linpc1,tyr 
> >> init_finalize; time
> >> 827.946u 210.370s 31:15.02 55.3%0+0k 4151+20io 2898pf+0w
> >> ^CKilled by signal 2.
> >> Killed by signal 2.
> >> 869.242u 221.644s 33:40.54 53.9%0+0k 4151+20io 2898pf+0w
> >> tyr small_prog 141 
> >>
> >> tyr small_prog 145 ompi_info | grep -e "Open MPI repo revision:" -e "C 
> >> compiler:"
> >>   Open MPI repo revision: dev-602-g82c02b4
> >>   C compiler: cc
> >> tyr small_prog 146 
> >>
> >>
> >> tyr small_prog 146 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec
> >> GNU gdb (GDB) 7.6.1
> >> ...
> >> (gdb) run -np 3 --host sunpc1,linpc1,tyr init_finalize
> >> Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 3 --host 
> >> sunpc1,linpc1,tyr 
> >> init_finalize
> >> [Thread debugging using libthread_db enabled]
> >> [New Thread 1 (LWP 1)]
> >> [New LWP2]
> >> Hello!
> >> Hello!
> >> Hello!
> >> [LWP2 exited]
> >> [New Thread 2]
> >> [Switching to Thread 1 (LWP 1)]
> >> sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to 
> >> satisfy query
> >> (gdb) run -np 6 --host sunpc1,linpc1,tyr init_finalize
> >> The program being debugged has been started already.
> >> Start it from the beginning? (y or n) y
> >>
> >> Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 6 --host 
> >> sunpc1,linpc1,tyr 
> >> init_finalize
> >> [Thread debugging using libthread_db enabled]
> >> [New Thread 1 (LWP 1)]
> >> [New LWP2]
> >> ^CKilled by signal 2.
> >> Killed by signal 2.
> >>
> >> Program received signal SIGINT, Interrupt.
> >> [Switching to Thread 1 (LWP 1)]
> >> 0x7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1
> >> (gdb) bt
> >> #0  0x7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1
> >> #1  0x7d1cb468 in _pollsys () from /lib/sparcv9/libc.so.1
> >> #2  0x7d170ed8 in poll () from /lib/sparcv9/libc.so.1
> >> #3  0x7e69a630 in poll_dispatch ()
> >>from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0
> >> #4  0x7e6894ec in opal_libevent2021_event_base_loop ()
> >>from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0
> >> #5  0x0001eb14 in orterun (argc=1757447168, 
> >> argv=0xff7ed8550cff)
> >> at 
> >> ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/orterun.c:1090
> >> #6  0x00014e2c in main (argc=256, argv=0xff7ed8af5c00)
> >> at ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/main.c:13
> >> (gdb) 
> >>
> >> Any ideas? Unfortunately I'm leaving for vaccation so that I cannot test
> >> any patches until the end of the year. Neverthess I wanted to report the
> >> problem. At the moment I cannot test if I have the same behaviour in a
> >> homogeneous environment with three machines because the new version isn't
> >> available before tomorrow on the other machines. I used the following
> >> configure command.
> >>
> >> ../openmpi-dev-602-g82c02b4/configure 
> >> --prefix=/usr/local/openmpi-1.9.0_64_cc \
> >>   --libdir=/usr/local/openmpi-1.9.0_64_cc/lib64 \
> >>   --with-jdk-bindir=/usr/local/jdk1.8.0/bin \
> >>   --with-jdk-headers=/usr/local/jdk1.8.0/include \
> >>   JAVA_HOME=/usr/local/jdk1.8.

Re: [OMPI users] processes hang with openmpi-dev-602-g82c02b4

2014-12-24 Thread Siegmar Gross

Hi Gilles,

Am 2014-12-24 08:09, schrieb Gilles Gouaillardet:

Siegmar,

could you please give a try to the attached patch ?
/* and keep in mind this is just a workaround that happen to work */


At the moment I can only read and answer email with my iPad. I will
try your patch next year when I'm back in my office.


Thank you very much for your help, Merry Christmas, and a Happy New 
Year


Siegmar




Cheers,

Gilles

On 2014/12/22 22:48, Siegmar Gross wrote:

Hi,

today I installed openmpi-dev-602-g82c02b4 on my machines (Solaris 
10 Sparc,
Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with gcc-4.9.2 
and the
new Solaris Studio 12.4 compilers. All build processes finished 
without
errors, but I have a problem running a very small program. It works 
for
three processes but hangs for six processes. I have the same 
behaviour

for both compilers.

tyr small_prog 139 time; mpiexec -np 3 --host sunpc1,linpc1,tyr 
init_finalize; time

827.161u 210.126s 30:51.08 56.0%0+0k 4151+20io 2898pf+0w
Hello!
Hello!
Hello!
827.886u 210.335s 30:54.68 55.9%0+0k 4151+20io 2898pf+0w
tyr small_prog 140 time; mpiexec -np 6 --host sunpc1,linpc1,tyr 
init_finalize; time

827.946u 210.370s 31:15.02 55.3%0+0k 4151+20io 2898pf+0w
^CKilled by signal 2.
Killed by signal 2.
869.242u 221.644s 33:40.54 53.9%0+0k 4151+20io 2898pf+0w
tyr small_prog 141

tyr small_prog 145 ompi_info | grep -e "Open MPI repo revision:" -e 
"C compiler:"

  Open MPI repo revision: dev-602-g82c02b4
  C compiler: cc
tyr small_prog 146


tyr small_prog 146 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec
GNU gdb (GDB) 7.6.1
...
(gdb) run -np 3 --host sunpc1,linpc1,tyr init_finalize
Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 3 
--host sunpc1,linpc1,tyr

init_finalize
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP2]
Hello!
Hello!
Hello!
[LWP2 exited]
[New Thread 2]
[Switching to Thread 1 (LWP 1)]
sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found 
to satisfy query

(gdb) run -np 6 --host sunpc1,linpc1,tyr init_finalize
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 6 
--host sunpc1,linpc1,tyr

init_finalize
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP2]
^CKilled by signal 2.
Killed by signal 2.

Program received signal SIGINT, Interrupt.
[Switching to Thread 1 (LWP 1)]
0x7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1
(gdb) bt
#0  0x7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1
#1  0x7d1cb468 in _pollsys () from /lib/sparcv9/libc.so.1
#2  0x7d170ed8 in poll () from /lib/sparcv9/libc.so.1
#3  0x7e69a630 in poll_dispatch ()
   from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0
#4  0x7e6894ec in opal_libevent2021_event_base_loop ()
   from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0
#5  0x0001eb14 in orterun (argc=1757447168, 
argv=0xff7ed8550cff)
at 
../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/orterun.c:1090

#6  0x00014e2c in main (argc=256, argv=0xff7ed8af5c00)
at 
../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/main.c:13

(gdb)

Any ideas? Unfortunately I'm leaving for vaccation so that I cannot 
test
any patches until the end of the year. Neverthess I wanted to report 
the
problem. At the moment I cannot test if I have the same behaviour in 
a
homogeneous environment with three machines because the new version 
isn't
available before tomorrow on the other machines. I used the 
following

configure command.

../openmpi-dev-602-g82c02b4/configure 
--prefix=/usr/local/openmpi-1.9.0_64_cc \

  --libdir=/usr/local/openmpi-1.9.0_64_cc/lib64 \
  --with-jdk-bindir=/usr/local/jdk1.8.0/bin \
  --with-jdk-headers=/usr/local/jdk1.8.0/include \
  JAVA_HOME=/usr/local/jdk1.8.0 \
  LDFLAGS="-m64 -mt" \
  CC="cc" CXX="CC" FC="f95" \
  CFLAGS="-m64 -mt" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" 
\

  CPP="cpp" CXXCPP="cpp" \
  CPPFLAGS="" CXXCPPFLAGS="" \
  --enable-mpi-cxx \
  --enable-cxx-exceptions \
  --enable-mpi-java \
  --enable-heterogeneous \
  --enable-mpi-thread-multiple \
  --with-threads=posix \
  --with-hwloc=internal \
  --without-verbs \
  --with-wrapper-cflags="-m64 -mt" \
  --with-wrapper-cxxflags="-m64 -library=stlport4" \
  --with-wrapper-ldflags="-mt" \
  --enable-debug \
  |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc

Furthermore I used the following test program.

#include 
#include 
#include "mpi.h"

int main (int argc, char *argv[])
{
  MPI_Init (&argc, &argv);
  printf ("Hello!\n");
  MPI_Finalize ();
  return EXIT_SUCCESS;
}



Kind regards

Siegmar

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this

Re: [OMPI users] processes hang with openmpi-dev-602-g82c02b4

2014-12-24 Thread Ralph Castain
I’d be a little cautious here - I’m not sure that hetero operations are 
completely fixed. The README is probably a bit over-stated (reflecting an 
earlier state), but I’m certain we haven’t extensively tested hetero operations 
and suspect there are still lingering issues.


> On Dec 23, 2014, at 11:29 PM, Kawashima, Takahiro 
>  wrote:
> 
> Gilles,
> 
> Ahh, I didn't know the current status. Thank you for the notice!
> 
> Thanks,
> Takahiro Kawashima
> 
>> Kawashima-san,
>> 
>> i'd rather consider this as a bug in the README (!)
>> 
>> 
>> heterogenous support has been broken for some time, but it was
>> eventually fixed.
>> 
>> truth is there are *very* limited resources (both human and hardware)
>> maintaining heterogeneous
>> support, but that does not mean heterogeneous support should not be
>> used, nor that bug report
>> will be ignored.
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> On 2014/12/24 9:26, Kawashima, Takahiro wrote:
>>> Hi Siegmar,
>>> 
>>> Heterogeneous environment is not supported officially.
>>> 
>>> README of Open MPI master says:
>>> 
>>> --enable-heterogeneous
>>>  Enable support for running on heterogeneous clusters (e.g., machines
>>>  with different endian representations).  Heterogeneous support is
>>>  disabled by default because it imposes a minor performance penalty.
>>> 
>>>  *** THIS FUNCTIONALITY IS CURRENTLY BROKEN - DO NOT USE ***
>>> 
 Hi,
 
 today I installed openmpi-dev-602-g82c02b4 on my machines (Solaris 10 
 Sparc,
 Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with gcc-4.9.2 and the
 new Solaris Studio 12.4 compilers. All build processes finished without
 errors, but I have a problem running a very small program. It works for
 three processes but hangs for six processes. I have the same behaviour
 for both compilers.
 
 tyr small_prog 139 time; mpiexec -np 3 --host sunpc1,linpc1,tyr 
 init_finalize; time
 827.161u 210.126s 30:51.08 56.0%0+0k 4151+20io 2898pf+0w
 Hello!
 Hello!
 Hello!
 827.886u 210.335s 30:54.68 55.9%0+0k 4151+20io 2898pf+0w
 tyr small_prog 140 time; mpiexec -np 6 --host sunpc1,linpc1,tyr 
 init_finalize; time
 827.946u 210.370s 31:15.02 55.3%0+0k 4151+20io 2898pf+0w
 ^CKilled by signal 2.
 Killed by signal 2.
 869.242u 221.644s 33:40.54 53.9%0+0k 4151+20io 2898pf+0w
 tyr small_prog 141 
 
 tyr small_prog 145 ompi_info | grep -e "Open MPI repo revision:" -e "C 
 compiler:"
  Open MPI repo revision: dev-602-g82c02b4
  C compiler: cc
 tyr small_prog 146 
 
 
 tyr small_prog 146 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec
 GNU gdb (GDB) 7.6.1
 ...
 (gdb) run -np 3 --host sunpc1,linpc1,tyr init_finalize
 Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 3 --host 
 sunpc1,linpc1,tyr 
 init_finalize
 [Thread debugging using libthread_db enabled]
 [New Thread 1 (LWP 1)]
 [New LWP2]
 Hello!
 Hello!
 Hello!
 [LWP2 exited]
 [New Thread 2]
 [Switching to Thread 1 (LWP 1)]
 sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to 
 satisfy query
 (gdb) run -np 6 --host sunpc1,linpc1,tyr init_finalize
 The program being debugged has been started already.
 Start it from the beginning? (y or n) y
 
 Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 6 --host 
 sunpc1,linpc1,tyr 
 init_finalize
 [Thread debugging using libthread_db enabled]
 [New Thread 1 (LWP 1)]
 [New LWP2]
 ^CKilled by signal 2.
 Killed by signal 2.
 
 Program received signal SIGINT, Interrupt.
 [Switching to Thread 1 (LWP 1)]
 0x7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1
 (gdb) bt
 #0  0x7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1
 #1  0x7d1cb468 in _pollsys () from /lib/sparcv9/libc.so.1
 #2  0x7d170ed8 in poll () from /lib/sparcv9/libc.so.1
 #3  0x7e69a630 in poll_dispatch ()
   from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0
 #4  0x7e6894ec in opal_libevent2021_event_base_loop ()
   from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0
 #5  0x0001eb14 in orterun (argc=1757447168, 
 argv=0xff7ed8550cff)
at 
 ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/orterun.c:1090
 #6  0x00014e2c in main (argc=256, argv=0xff7ed8af5c00)
at ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/main.c:13
 (gdb) 
 
 Any ideas? Unfortunately I'm leaving for vaccation so that I cannot test
 any patches until the end of the year. Neverthess I wanted to report the
 problem. At the moment I cannot test if I have the same behaviour in a
 homogeneous environment with three machines because the new version isn't
 availabl