This was "Re: [OMPI users] Trapping fortran I/O errors leaving zombie mpi processes", but it is more severe than this.
Sorry, but it appears that at least with ifort most run-time errors and signals will leave zombie processes behind with openmpi if they only occur on some of the processors and not all. You can test this with the attached using (for instance) mpicc -c doraise.c mpif90 -o crash_test crash_test.F doraise.o -FR -xHost -O3 Then, as appropriate mpirun -np 8 crash_test The output is self explanatory, and has an option to both try and simulate common fortran problems as well as to send fortran or C signals to the process. Please note that the results can be dependent upon the level of optimization, and with other compilers there could be problems where the compiler complains about SIGSEV or other errors since the code deliberately tries to create these. -- Laurence Marks Department of Materials Science and Engineering MSE Rm 2036 Cook Hall 2220 N Campus Drive Northwestern University Evanston, IL 60208, USA Tel: (847) 491-3996 Fax: (847) 491-7820 email: L-marks at northwestern dot edu Web: www.numis.northwestern.edu Chair, Commission on Electron Crystallography of IUCR www.numis.northwestern.edu/ Electron crystallography is the branch of science that uses electron scattering and imaging to study the structure of matter.
#include <signal.h> #include <stdio.h> void doraise(isig) long isig[1] ; { int i, j ; i = isig[0]; raise( i ); /* signal i is raised */ } void doraise_(isig) long isig[1] ; { doraise(isig) ; } void whatsig(isig) long isig[1] ; { int i ; i = isig[0]; psignal( i , "Testing Signal"); } void whatsig_(isig) long isig[1] ; { whatsig(isig) ; } void showallsignals() { int i ; char buf[15]; for ( i = 1; i < 32; i++ ) { sprintf(buf, "Signal code %d ", i); psignal( i , buf ); } } void showallsignals_() { showallsignals() ; }
crash_test.F
Description: Binary data