Sorry for the delay in replying...
On Sep 1, 2009, at 1:11 AM, Shaun Jackman wrote:
> Looking at the source code of MPI_Request_get_status, it...
> calls OPAL_CR_NOOP_PROGRESS()
> returns true in *flag if request->req_complete
> calls opal_progress()
> returns false in *flag
Keep in mind that MPI_REQUEST_GET_STATUS is exactly the same as
MPI_TEST except that the MPI_Request will not be deallocated if the
request has completed.
> What's the difference between OPAL_CR_NOOP_PROGRESS() and
> opal_progress()? If the request has already completed, does it mean
> that since opal_progress() is not called, no further progress is
made?
OPAL_CR_NOOP_PROGRESS() seems to be related to checkpoint/restart and
is a no-op unless fault-tolerance is being used.
Correct.
Two questions then...
1. If the request has already completed, does it mean that since
opal_progress() is not called, no further progress is made?
Correct. It's a latency thing; if your request has already completed,
we just tell you without further delay (i.e., without invoking
opal_progress(), which may trigger lots of other things, and therefore
increase the latency of MPI_REQUEST_GET_STATUS returning).
opal_progress() is our lowest-level progression engine call. It kicks
all kinds of registered progression callbacks from all over the code
base.
2. request->req_complete is tested before calling opal_progress(). Is
it possible that request->req_complete is now true after calling
opal_progress() when this function returns false in *flag?
Yes. I suppose it could be an optimization to duplicate the block
testing for request->req_complete==true below the call to
opal_progress(). I'm guessing the only reason it wasn't done was to
avoid code duplication. Additionally, the call to opal_progress() is
surrounded by an #if block testing OPAL_ENABLE_PROGRESS_THREADS -- if
we have progress threads enabled, the thought was that opal_progress()
(and friends) would be invoked automatically (and probably
continuously) by other threads. The progression thread code is not
well tested -- I'd be surprised if it worked at all, because I doubt
anyone is testing it -- but it has been in our design since the very
beginning. This is likely another reason we don't test again for
req_complete==true after the call to opal_progress() -- because that
block would need to be protected by that #if, leading to further code
complexity.
--
Jeff Squyres
jsquy...@cisco.com