Re: [OMPI users] OpenMPI 1.2.5 race condition / core dump with MPI_Reduce and MPI_Gather

John Markus Bjørndalen Fri, 29 Feb 2008 10:20:17 -0500

George Bosilca wrote:


[.....]

I don't think the root crashed. I guess that one of the other nodescrashed, the root got a bad socket (which is what the first errormessage seems to indicate), and get terminated. As the output is notsynchronized between the nodes, one cannot rely on its order norcontents. Moreover, mpirun report that the root was killed with signal15, which is how we cleanup the remaining processes when we detectthat something really bad (like a seg fault) happened in the parallelapplication.

Sorry, I should have rephrased that as a question ("is it the root?").I'm not that familiar with the debug output of OpenMPI yet, so Iincluded it in case somebody made more sense of it than me.

There are many differences between the routed and non routedcollectives. All errors that you reported so far are related to rootedcollectives, which make sense. I didn't state that it is normal thatOpen MPI do not behave [sic]. I wonder if you can get such errors withnon routed collectives (such as allreduce, allgather and alltoall), orwith messages larger than the eager size ?

You're right, I haven't seen any crashes with the All*-variants.

TCP eager limit is set to 65536 (output from ompi_info):

    MCA btl: parameter "btl_tcp_eager_limit" (current value: "65536")
    MCA btl: parameter "btl_tcp_min_send_size" (current value: "65536")
    MCA btl: parameter "btl_tcp_max_send_size" (current value: "131072")

I observed crashes with Broadcasts and Reduces of 131072 bytes. I'mplaying around with larger messages now, and while Reduce with 16 nodesseem stable at 262144 byte messages, it still crashes with 44 nodes.

If you type "ompi_info --param btl tcp", you will see what is theeager size for the TCP BTL. Everything smaller than this size will besend eagerly; have the opportunity to became unexpected on thereceiver side and can lead to this problem. As a quick test, you canadd "--mca btl_tcp_eager_limit 2048" to your mpirun command line, andthis problem will not happen with for size over the 2K. This was theoriginal solution for the flow control problem. If you know yourapplication will generate thousands of unexpected messages, then youshould set the eager limit to zero.

I tried running Reduce with 4096 ints (16384 bytes), 16 nodes and eagerlimit 2048:

mpirun -hostfile lamhosts.all.r360 -np 16 --mca btl_tcp_eager_limit 2048./ompi-crash 4096 2 3000{ 'groupsize' : 16, 'count' : 4096, 'bytes' : 16384, 'bufbytes' :262144, 'iters' : 3000, 'bmno' : 2[compute-2-2][0,1,10][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv][compute-3-2][0,1,14][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]mca_btl_tcp_frag_recv: readv failed with errno=104

mca_btl_tcp_frag_recv: readv failed with errno=104

mpirun noticed that job rank 0 with PID 30407 on node compute-0-0 exitedon signal 15 (Terminated).

15 additional processes aborted (not shown)

This one tries to run Reduce with 1 integer per node and also crashes(with eager size 0):

/mpirun -hostfile lamhosts.all.r360 -np 16 --mca btl_tcp_eager_limit 0./ompi-crash 1 2 3000

...

This is puzzling.

I'm mostly familiarizing myself with OpenMPI at the moment as well aspoking around to see how the collective operations work and performcompared to LAM. Partly because I have some ideas I'd like to test out,and partly because I'm considering to move some student exercises overfrom LAM to OpenMPI. I don't expect to write actual applications thattreat MPI like this myself, but on the other hand, not having to dothrottling on top of MPI could be an advantage in some applicationpatterns.



Regards,

--
// John Markus Bjørndalen
// http://www.cs.uit.no/~johnm/

Re: [OMPI users] OpenMPI 1.2.5 race condition / core dump with MPI_Reduce and MPI_Gather

Reply via email to