On Jan 29, 2010, at 8:23 AM, Laurence Marks wrote:

> I'll try, but sometimes these things are hard to reproduce and I have
> to wait for free nodes to do the test.

Understood.

> If I do manage to reproduce the
> issue (I've added ERR= traps, so would have to regress) any thing else
> to look at?

You might want to write up a trivial fortran example outside of your main app 
-- a 10-20 line app that explicitly reads past the end of a trivial file in one 
MPI process while all the other processes are waiting in an MPI_Barrier, or 
somesuch.  That way you could test this easily even on 1 node, and not have to 
regress your source, etc.

I think counting the processes should be sufficient.  But with a small/trivial 
test like described above, you might even want to put in some extra print* 
statements, just to verify exactly where the process stopped, whether it 
actually exited, etc.

-- 
Jeff Squyres
jsquy...@cisco.com


Reply via email to