Jeff Squyres wrote:

On Aug 24, 2009, at 1:03 PM, Eugene Loh wrote:

E.g., let's say P0 and P1 each send a message to P2, both using the same tag and communicator. Let's say P2 does two receives on that communicator and tag, using a wildcard source. So, the messages could be received in either order. One could introduce barriers to order the messages. E.g.,

P0:
  Send
  Barrier
P1:
  Barrier
  Send
P2:
  Recv
  Barrier
  Recv

Is this behavior *guaranteed* by MPI? I'm not actually sure that it is; barrier does not provide any guarantees about point-to-point message passing progress.

For example, how about a machine with these assumptions:

- P0 is "far away" from P2 on the point-to-point network
- P1 is "close by" to P2 on the point-to-point network
- Barriers go across a separate/fast network (think: bluegene)
- P0's send message is short/eager

In this case, the Send from P0 complete "immediately" and enter the barrier before it is delivered to P2. The P0 send could then take a "long time" to get to P2 --

Okay, so let's say P0 completes its send and enters the barrier.

Also, P1 enters the barrier. But it will not issue a send until it leaves the barrier, which requires that the last process has entered the barrier.

Meanwhile, the last process, P2, is waiting on a receive before it enters the barrier.

So, here's the situation. P2 is waiting to receive a message, a message has been sent to P2, and no other message will be sent to P2 until some message has been received. So, there are only two options:

1) The first receive on P2 receives the message from P0.  Or,

2) This perfectly legal MPI program deadlocks.

Right?

potentially long enough for the barrier to  overtake it

No. The first Recv on P2 has to complete before P2 can enter the barrier, which is a prerequisite for the barrier to complete on any process.

and for the Send from P1 to be delivered to P2 before the Send from P0 arrives at P2.

Couldn't that happen?

No. The send on P1 cannot be issued before the barrier completes on P1, which cannot happen before the barrier is entered on P2, which cannot happen before the first Recv on P2 is completed, which cannot happen until some message is received on P2. And, the only message that can be received on P2 is the one issued by P0.

Granted, I would expect that your example would perform in most real- world situations as you describe (P0 is delivered to P2, then P1 is delivered to P2). But I don't think the standard guarantees it.

Reply via email to