Shaun Jackman wrote:
Jeff Squyres wrote:
On Aug 26, 2009, at 10:38 AM, Jeff Squyres (jsquyres) wrote:

Yes, this could cause blocking.  Specifically, the receiver may not
advance any other senders until the matching Irecv is posted and is
able to make progress.
I should clarify something else here -- for long messages where the pipeline protocol is used, OB1 may need to be invoked repeatedly to keep making progress on all the successive fragments. I.e., if a send is long enough to entail many fragments, then OB1 may (read: likely will) not progress *all* of them simultaneously. Hence, if you're calling MPI_Test(), for example, to kick the progress engine, you may have to call it a few times to get *all* the fragments processed.

How many fragments are processed in each call to progress can depend on the speed of your hardware and network, etc.

Hi Jeff,

Looking at the source code of MPI_Request_get_status, it...
calls OPAL_CR_NOOP_PROGRESS()
returns true in *flag if request->req_complete
calls opal_progress()
returns false in *flag

What's the difference between OPAL_CR_NOOP_PROGRESS() and opal_progress()? If the request has already completed, does it mean that since opal_progress() is not called, no further progress is made?

OPAL_CR_NOOP_PROGRESS() seems to be related to checkpoint/restart and is a no-op unless fault-tolerance is being used.

Two questions then...

1. If the request has already completed, does it mean that since opal_progress() is not called, no further progress is made?

2. request->req_complete is tested before calling opal_progress(). Is it possible that request->req_complete is now true after calling opal_progress() when this function returns false in *flag?

Thanks,
Shaun

Reply via email to