Re: [OMPI users] mca_pml_ob1_send blocks

Jeff Squyres Tue, 1 Sep 2009 23:53:20 -0400

Sorry for the delay in replying...


On Sep 1, 2009, at 1:11 AM, Shaun Jackman wrote:

> Looking at the source code of MPI_Request_get_status, it...
> calls OPAL_CR_NOOP_PROGRESS()
> returns true in *flag if request->req_complete
> calls opal_progress()
> returns false in *flag

Keep in mind that MPI_REQUEST_GET_STATUS is exactly the same asMPI_TEST except that the MPI_Request will not be deallocated if therequest has completed.

> What's the difference between OPAL_CR_NOOP_PROGRESS() and
> opal_progress()? If the request has already completed, does it mean

> that since opal_progress() is not called, no further progress ismade?


OPAL_CR_NOOP_PROGRESS() seems to be related to checkpoint/restart and
is a no-op unless fault-tolerance is being used.


Correct.

Two questions then...

1. If the request has already completed, does it mean that since
opal_progress() is not called, no further progress is made?

Correct. It's a latency thing; if your request has already completed,we just tell you without further delay (i.e., without invokingopal_progress(), which may trigger lots of other things, and thereforeincrease the latency of MPI_REQUEST_GET_STATUS returning).

opal_progress() is our lowest-level progression engine call. It kicksall kinds of registered progression callbacks from all over the codebase.

2. request->req_complete is tested before calling opal_progress(). Is
it possible that request->req_complete is now true after calling
opal_progress() when this function returns false in *flag?

Yes. I suppose it could be an optimization to duplicate the blocktesting for request->req_complete==true below the call toopal_progress(). I'm guessing the only reason it wasn't done was toavoid code duplication. Additionally, the call to opal_progress() issurrounded by an #if block testing OPAL_ENABLE_PROGRESS_THREADS -- ifwe have progress threads enabled, the thought was that opal_progress()(and friends) would be invoked automatically (and probablycontinuously) by other threads. The progression thread code is notwell tested -- I'd be surprised if it worked at all, because I doubtanyone is testing it -- but it has been in our design since the verybeginning. This is likely another reason we don't test again forreq_complete==true after the call to opal_progress() -- because thatblock would need to be protected by that #if, leading to further codecomplexity.


--
Jeff Squyres
jsquy...@cisco.com

Re: [OMPI users] mca_pml_ob1_send blocks

Reply via email to