On Jan 13, 2006, at 10:41 PM, Glenn Morris wrote:
The combination OpenMP + OpenMPI works fine if I restrict the
application to only 1 OpenMP thread per MPI process (in other words
the code at least compiles and runs fine with both options on, in this
limited sense). If I try to use my desired value of 4 OpenMP threads,
it crashes. It works fine, however, if I use MPICH for the MPI
implementation.
The hostfile specifies "slots=4 max-slots=4" for each host (trying to
lie and say "slots=1" die not help), and I use "-np 4 --bynode" to get
only one MPI process per host. I'm using ssh over Gbit ethernet
between hosts.
There is no useful error message that I can see. Watching top, I can
see that processes are spawned on the four hosts, split into 4 OpenMP
threads, and then crash immediately. The only error message is:
mpirun noticed that job rank 0 with PID 30243 on node "coma006"
exited on signal 11.
Broken pipe
It looks like your application is dying from a segmentation fault.
The question is -- did Open MPI cause the segfault or is there
something in your application that Open MPI didn't like. It would be
useful to get a stack trace from the process that is causing the
segfault. Since you're only running 4 processes and using ssh to
start them, the easiest way is to start your process in gdb in an
xterm. You have to have ssh X forwarding enabled for this trick to
work, but then running something like:
mpirun -np 4 --bynode -d xterm -e gdb <myapp>
should pop up 4 xterm windows, one for each process. Type "run" in
each gdb process in the xterms and it should be off and running.
If this would be a major pain, the other option is to try the nightly
build of Open MPI from the trunk, as it will try to print a stack
trace when errors like the one above occur. But I would start with
trying the gdb method. Of course, if you have TotalView or another
parallel debugger, that would be even easier.
Brian
--
Brian Barrett
Open MPI developer
http://www.open-mpi.org/