Hi Gilles,

Am 2014-12-24 08:09, schrieb Gilles Gouaillardet:
Siegmar,

could you please give a try to the attached patch ?
/* and keep in mind this is just a workaround that happen to work */

At the moment I can only read and answer email with my iPad. I will
try your patch next year when I'm back in my office.


Thank you very much for your help, Merry Christmas, and a Happy New Year

Siegmar



Cheers,

Gilles

On 2014/12/22 22:48, Siegmar Gross wrote:
Hi,

today I installed openmpi-dev-602-g82c02b4 on my machines (Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with gcc-4.9.2 and the new Solaris Studio 12.4 compilers. All build processes finished without errors, but I have a problem running a very small program. It works for three processes but hangs for six processes. I have the same behaviour
for both compilers.

tyr small_prog 139 time; mpiexec -np 3 --host sunpc1,linpc1,tyr init_finalize; time
827.161u 210.126s 30:51.08 56.0%        0+0k 4151+20io 2898pf+0w
Hello!
Hello!
Hello!
827.886u 210.335s 30:54.68 55.9%        0+0k 4151+20io 2898pf+0w
tyr small_prog 140 time; mpiexec -np 6 --host sunpc1,linpc1,tyr init_finalize; time
827.946u 210.370s 31:15.02 55.3%        0+0k 4151+20io 2898pf+0w
^CKilled by signal 2.
Killed by signal 2.
869.242u 221.644s 33:40.54 53.9%        0+0k 4151+20io 2898pf+0w
tyr small_prog 141

tyr small_prog 145 ompi_info | grep -e "Open MPI repo revision:" -e "C compiler:"
  Open MPI repo revision: dev-602-g82c02b4
              C compiler: cc
tyr small_prog 146


tyr small_prog 146 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec
GNU gdb (GDB) 7.6.1
...
(gdb) run -np 3 --host sunpc1,linpc1,tyr init_finalize
Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 3 --host sunpc1,linpc1,tyr
init_finalize
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP    2        ]
Hello!
Hello!
Hello!
[LWP    2         exited]
[New Thread 2        ]
[Switching to Thread 1 (LWP 1)]
sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to satisfy query
(gdb) run -np 6 --host sunpc1,linpc1,tyr init_finalize
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 6 --host sunpc1,linpc1,tyr
init_finalize
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP    2        ]
^CKilled by signal 2.
Killed by signal 2.

Program received signal SIGINT, Interrupt.
[Switching to Thread 1 (LWP 1)]
0xffffffff7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1
(gdb) bt
#0  0xffffffff7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1
#1  0xffffffff7d1cb468 in _pollsys () from /lib/sparcv9/libc.so.1
#2  0xffffffff7d170ed8 in poll () from /lib/sparcv9/libc.so.1
#3  0xffffffff7e69a630 in poll_dispatch ()
   from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0
#4  0xffffffff7e6894ec in opal_libevent2021_event_base_loop ()
   from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0
#5 0x000000010000eb14 in orterun (argc=1757447168, argv=0xffffff7ed8550cff) at ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/orterun.c:1090
#6  0x0000000100004e2c in main (argc=256, argv=0xffffff7ed8af5c00)
at ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/main.c:13
(gdb)

Any ideas? Unfortunately I'm leaving for vaccation so that I cannot test any patches until the end of the year. Neverthess I wanted to report the problem. At the moment I cannot test if I have the same behaviour in a homogeneous environment with three machines because the new version isn't available before tomorrow on the other machines. I used the following
configure command.

../openmpi-dev-602-g82c02b4/configure --prefix=/usr/local/openmpi-1.9.0_64_cc \
  --libdir=/usr/local/openmpi-1.9.0_64_cc/lib64 \
  --with-jdk-bindir=/usr/local/jdk1.8.0/bin \
  --with-jdk-headers=/usr/local/jdk1.8.0/include \
  JAVA_HOME=/usr/local/jdk1.8.0 \
  LDFLAGS="-m64 -mt" \
  CC="cc" CXX="CC" FC="f95" \
CFLAGS="-m64 -mt" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \
  CPP="cpp" CXXCPP="cpp" \
  CPPFLAGS="" CXXCPPFLAGS="" \
  --enable-mpi-cxx \
  --enable-cxx-exceptions \
  --enable-mpi-java \
  --enable-heterogeneous \
  --enable-mpi-thread-multiple \
  --with-threads=posix \
  --with-hwloc=internal \
  --without-verbs \
  --with-wrapper-cflags="-m64 -mt" \
  --with-wrapper-cxxflags="-m64 -library=stlport4" \
  --with-wrapper-ldflags="-mt" \
  --enable-debug \
  |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc

Furthermore I used the following test program.

#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"

int main (int argc, char *argv[])
{
  MPI_Init (&argc, &argv);
  printf ("Hello!\n");
  MPI_Finalize ();
  return EXIT_SUCCESS;
}



Kind regards

Siegmar

_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: http://www.open-mpi.org/community/lists/users/2014/12/26052.php

Reply via email to