On Jan 13, 2006, at 10:41 PM, Glenn Morris wrote:

The combination OpenMP + OpenMPI works fine if I restrict the
application to only 1 OpenMP thread per MPI process (in other words
the code at least compiles and runs fine with both options on, in this
limited sense). If I try to use my desired value of 4 OpenMP threads,
it crashes. It works fine, however, if I use MPICH for the MPI
implementation.

The hostfile specifies "slots=4 max-slots=4" for each host (trying to
lie and say "slots=1" die not help), and I use "-np 4 --bynode" to get
only one MPI process per host. I'm using ssh over Gbit ethernet
between hosts.

There is no useful error message that I can see. Watching top, I can
see that processes are spawned on the four hosts, split into 4 OpenMP
threads, and then crash immediately. The only error message is:

    mpirun noticed that job rank 0 with PID 30243 on node "coma006"
    exited on signal 11.
    Broken pipe

It looks like your application is dying from a segmentation fault. The question is -- did Open MPI cause the segfault or is there something in your application that Open MPI didn't like. It would be useful to get a stack trace from the process that is causing the segfault. Since you're only running 4 processes and using ssh to start them, the easiest way is to start your process in gdb in an xterm. You have to have ssh X forwarding enabled for this trick to work, but then running something like:

    mpirun -np 4 --bynode -d xterm -e gdb <myapp>

should pop up 4 xterm windows, one for each process. Type "run" in each gdb process in the xterms and it should be off and running.

If this would be a major pain, the other option is to try the nightly build of Open MPI from the trunk, as it will try to print a stack trace when errors like the one above occur. But I would start with trying the gdb method. Of course, if you have TotalView or another parallel debugger, that would be even easier.

Brian


--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/


Reply via email to