Re: [OMPI users] Fault Tolerance & Behavior

Troy Telford Thu, 26 Oct 2006 19:39:25 -0400

On Thu, 26 Oct 2006 15:11:46 -0600, George Bosilca <bosi...@cs.utk.edu>wrote:

The Open MPI behavior is the same independently of the network used
for the job. At least the behavior dictated by our internal message
passing layer.


Which is one of the things I like about Open MPI.

There is nothing (that has a reasonable cost) we can do about this.

Nor do I think there should be something done. In all honesty, I thinkit's a good thing that TCP & Myrinet have such a long timeout. It makesadministration a bit less scary; if you accidentally unplug the networkcable from the wrong node during maintenance, neither the MPI nor theadministrator loses a job.

I'm also confident that both TCP & Myrinet would throw an error when theytime out; it's just that I haven't felt the need to verify it. (And withsome-odd 20 minutes for Myrinet, it takes a bit of attention span. Thelast time I tried it I had forgotten about it for about 3-4 hours).

If none are available, then Open
MPI is supposed to abort the job. For your particular run did you had
Ethernet between the nodes ? If yes, I'm quite sure the MPI run
wasn't stopped ... it continued using the TCP device (if not disabled
by hand at mpirun time).

This brings up an interesting question: The job was simply Intel's MPIbenchmark (IMB), which is fairly chatty (ie. lots of screen output).

On the first try, I used '--mca btl ^gm,^mx' to start the job. Ethernetwas connected (eth0=10/100, eth1=gigabit), but after the IB cable wasdisconnected, everything stopped. The link lights (ethernet & IB) werenot blinking, nor do any of the system monitors show much TCP traffic;certainly not the sort of traffic one would expect from an IMB run.

I've also tried using --mca openib,sm,self,tcp (specifically adding TCP)and didn't see any sort of difference; the job still 'stuck' as soon asthe IB cable was removed. I'll let that job continue to run overnight(ie. --mca btl tcp,openib,sm,self) to see if the job ever wakes up.

--mca btl ^tcp (or --mca btl opnib,sm,self).

I get the messages that something is amiss with the IB fabric (asexpected). However, the job does *not* abort. Every (MPI) process onevery node in the job is still active, and sucking 100% of its cpu (Iimagine busy-waiting).

PS: There are several internal message passing modules available for
Open MPI. The default one, look more for performance than
reliability. If reliability it's what you need you should use the DR
PML. For this, you can specify --mca pml dr at mpirun time. This (DR)
PML has data reliability and timeout (Open MPI internal timeout that
are configurable), allowing to recover faster from a network failure.

I don't have such a component. Hopefully it's just the version of OpenMPI I'm using (1.1), or an option needed by ./configure I didn't use. (Ifit should be in 1.1, I'll take a deeper look, and can provide things likethe config.log, etc. I just don't want to flood the list at the moment.)

--
Troy Telford

Re: [OMPI users] Fault Tolerance & Behavior

Reply via email to