Sorry for the delay in replying...

On Sep 1, 2009, at 1:11 AM, Shaun Jackman wrote:

> Looking at the source code of MPI_Request_get_status, it...
> calls OPAL_CR_NOOP_PROGRESS()
> returns true in *flag if request->req_complete
> calls opal_progress()
> returns false in *flag


Keep in mind that MPI_REQUEST_GET_STATUS is exactly the same as MPI_TEST except that the MPI_Request will not be deallocated if the request has completed.

> What's the difference between OPAL_CR_NOOP_PROGRESS() and
> opal_progress()? If the request has already completed, does it mean
> that since opal_progress() is not called, no further progress is made?

OPAL_CR_NOOP_PROGRESS() seems to be related to checkpoint/restart and
is a no-op unless fault-tolerance is being used.


Correct.

Two questions then...

1. If the request has already completed, does it mean that since
opal_progress() is not called, no further progress is made?


Correct. It's a latency thing; if your request has already completed, we just tell you without further delay (i.e., without invoking opal_progress(), which may trigger lots of other things, and therefore increase the latency of MPI_REQUEST_GET_STATUS returning).

opal_progress() is our lowest-level progression engine call. It kicks all kinds of registered progression callbacks from all over the code base.

2. request->req_complete is tested before calling opal_progress(). Is
it possible that request->req_complete is now true after calling
opal_progress() when this function returns false in *flag?



Yes. I suppose it could be an optimization to duplicate the block testing for request->req_complete==true below the call to opal_progress(). I'm guessing the only reason it wasn't done was to avoid code duplication. Additionally, the call to opal_progress() is surrounded by an #if block testing OPAL_ENABLE_PROGRESS_THREADS -- if we have progress threads enabled, the thought was that opal_progress() (and friends) would be invoked automatically (and probably continuously) by other threads. The progression thread code is not well tested -- I'd be surprised if it worked at all, because I doubt anyone is testing it -- but it has been in our design since the very beginning. This is likely another reason we don't test again for req_complete==true after the call to opal_progress() -- because that block would need to be protected by that #if, leading to further code complexity.

--
Jeff Squyres
jsquy...@cisco.com

Reply via email to