On Aug 14, 2013, at 9:23 AM, "Hazelrig, Chris CTR (US)" <christopher.c.hazelrig....@mail.mil> wrote:
> Thanks for your suggestions. I had already tested for which threads were > reaching the Finalize() call and all of them are. Also, the Finalize() call > is not inside a conditional. This seems to suggest there may be a prior > communication left unfinished, but based on the documentation I have read I > would think the Finalize() routine would error/exception out in that > situation. Sorry for the delayed reply -- I was on vacation last week. Not necessarily -- you can definitely deadlock in Finalize if you've done a send that isn't matched with a receive, for example. > It seems significant that the software was performing as expected under the > previous OS and OpenMPI versions (although, the older OpenMPI version is only > slightly older than what is being used now), but I don't know yet what the > differences are. Possibly, but not definitely. Just because an application runs properly under an MPI implementation does not mean that that application is correct (that sounds snobby, but I don't mean it that way). Buffer allocations and blocking patterns change from release to release of a given MPI implementation, such that if you have an erroneous MPI application, it may work fine under version A of that MPI implementation but fail under version B of that same MPI implementation. > Is there any other information I could provide that might be useful? You might want to audit the code and ensure that you have no pending communications that haven't finished -- check all your sends and receives, not just in the code, but at run-time (e.g., use an MPI profiling tool to match up the sends and receives, and see what's left at Finalize time). -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/