I managed to find time to reproduce the issue, although it is not very
reproducible in it's results and I suspect it may not be easy to
reproduce with a simple code plus I've never actually constructed a
mpi code so.... (I am cc'ing Michael Sternberg who compiled the
openmpi in case there are flags to add to the compilation.)

I have 8 processes on a single dual quadcore reading from the same
file using formatted fortran I/O. I deliberately created an error in
the read. If this error is a format error, all the processes
terminate. If the error is because there is not enough data (EOF), I
get somewhere from 1 to 7 zombie's. They don't seem to be doing
anything (top -ulmarks shows no CPU activity) but I have no idea if
they have locks on the file or anything else (I think they might, but
have no idea how to tell).

On Fri, Jan 29, 2010 at 6:18 PM, Jeff Squyres <jsquy...@cisco.com> wrote:
> On Jan 29, 2010, at 9:13 AM, Laurence Marks wrote:
>
>> OK, but trivial codes don't always reproduce problems.
>
> Yes, but if the problem is a file reading beyond the end, that should be 
> fairly isolated behavior.
>
>> Is strace useful?
>
> Sure.  But let's check to see if the apps are actually dying or hanging first.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Laurence Marks
Department of Materials Science and Engineering
MSE Rm 2036 Cook Hall
2220 N Campus Drive
Northwestern University
Evanston, IL 60208, USA
Tel: (847) 491-3996 Fax: (847) 491-7820
email: L-marks at northwestern dot edu
Web: www.numis.northwestern.edu
Chair, Commission on Electron Crystallography of IUCR
www.numis.northwestern.edu/
Electron crystallography is the branch of science that uses electron
scattering and imaging to study the structure of matter.

Reply via email to