On Aug 14, 2013, at 9:23 AM, "Hazelrig, Chris CTR (US)" 
<christopher.c.hazelrig....@mail.mil> wrote:

> Thanks for your suggestions.  I had already tested for which threads were 
> reaching the Finalize() call and all of them are.  Also, the Finalize() call 
> is not inside a conditional.  This seems to suggest there may be a prior 
> communication left unfinished, but based on the documentation I have read I 
> would think the Finalize() routine would error/exception out in that 
> situation.  

Sorry for the delayed reply -- I was on vacation last week.

Not necessarily -- you can definitely deadlock in Finalize if you've done a 
send that isn't matched with a receive, for example.

> It seems significant that the software was performing as expected under the 
> previous OS and OpenMPI versions (although, the older OpenMPI version is only 
> slightly older than what is being used now), but I don't know yet what the 
> differences are.

Possibly, but not definitely.  Just because an application runs properly under 
an MPI implementation does not mean that that application is correct (that 
sounds snobby, but I don't mean it that way).  Buffer allocations and blocking 
patterns change from release to release of a given MPI implementation, such 
that if you have an erroneous MPI application, it may work fine under version A 
of that MPI implementation but fail under version B of that same MPI 
implementation.

> Is there any other information I could provide that might be useful?

You might want to audit the code and ensure that you have no pending 
communications that haven't finished -- check all your sends and receives, not 
just in the code, but at run-time (e.g., use an MPI profiling tool to match up 
the sends and receives, and see what's left at Finalize time).

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to