OK, but trivial codes don't always reproduce problems. Is strace useful?
On Fri, Jan 29, 2010 at 7:32 AM, Jeff Squyres <jsquy...@cisco.com> wrote: > On Jan 29, 2010, at 8:23 AM, Laurence Marks wrote: > >> I'll try, but sometimes these things are hard to reproduce and I have >> to wait for free nodes to do the test. > > Understood. > >> If I do manage to reproduce the >> issue (I've added ERR= traps, so would have to regress) any thing else >> to look at? > > You might want to write up a trivial fortran example outside of your main app > -- a 10-20 line app that explicitly reads past the end of a trivial file in > one MPI process while all the other processes are waiting in an MPI_Barrier, > or somesuch. That way you could test this easily even on 1 node, and not > have to regress your source, etc. > > I think counting the processes should be sufficient. But with a > small/trivial test like described above, you might even want to put in some > extra print* statements, just to verify exactly where the process stopped, > whether it actually exited, etc. > > -- > Jeff Squyres > jsquy...@cisco.com > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Laurence Marks Department of Materials Science and Engineering MSE Rm 2036 Cook Hall 2220 N Campus Drive Northwestern University Evanston, IL 60208, USA Tel: (847) 491-3996 Fax: (847) 491-7820 email: L-marks at northwestern dot edu Web: www.numis.northwestern.edu Chair, Commission on Electron Crystallography of IUCR www.numis.northwestern.edu/ Electron crystallography is the branch of science that uses electron scattering and imaging to study the structure of matter.