In this particular case, I don't think the solution is that obvious. If you look at the stack in the original email, you will notice how we get into this. The problem here, is that the FREE_LIST_WAIT is used to get a fragment to store an unexpected message. If this macro return NULL (in other words the PML is unable to store the unexpected message), what do you expect to happen ? Drop the message ? Ask the BTL to hold it for a while ? How about ordering ?

It is unfortunate to say it, only few days after we had the discussion about the flow control, but the only correct solution here is to add PML level flow control ...

  george.

On Feb 28, 2008, at 2:55 PM, Christian Bell wrote:

On Thu, 28 Feb 2008, Gleb Natapov wrote:

The trick is to call progress only from functions that are called
directly by a user process. Never call progress from a callback functions. The main offenders of this rule are calls to OMPI_FREE_LIST_WAIT(). They should be changed to OMPI_FREE_LIST_GET() and dial with NULL return value.

Right -- and it should be easy to find more offenders by having an
assert statement soak in the builds for a while (or by default in
debug mode).

Was if it was ever part of the (or a) design to allow re-entrant
calls to progress from the same calling thread ?  It can be done but
callers have to have a holistic view of how other components require
and make the progress happen -- this isn't compatible with the Open
MPI model of independent dynamically loadable components.

--
christian.b...@qlogic.com
(QLogic Host Solutions Group, formerly Pathscale)
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to