On Jan 29, 2010, at 8:23 AM, Laurence Marks wrote: > I'll try, but sometimes these things are hard to reproduce and I have > to wait for free nodes to do the test.
Understood. > If I do manage to reproduce the > issue (I've added ERR= traps, so would have to regress) any thing else > to look at? You might want to write up a trivial fortran example outside of your main app -- a 10-20 line app that explicitly reads past the end of a trivial file in one MPI process while all the other processes are waiting in an MPI_Barrier, or somesuch. That way you could test this easily even on 1 node, and not have to regress your source, etc. I think counting the processes should be sufficient. But with a small/trivial test like described above, you might even want to put in some extra print* statements, just to verify exactly where the process stopped, whether it actually exited, etc. -- Jeff Squyres jsquy...@cisco.com