FWIW, I'm unable to replicate your behavior. This is with Open MPI 1.4.2 on RHEL5:
---- [9:52] svbu-mpi:~/mpi % cat abort.c #include <stdio.h> #include <stdlib.h> #include <mpi.h> int main(int argc, char **argv) { int rank; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); if (0 == rank) { abort(); } printf("Rank %d sleeping...\n", rank); sleep(600); printf("Rank %d finalizing...\n", rank); MPI_Finalize(); return 0; } [9:52] svbu-mpi:~/mpi % mpicc abort.c -o abort [9:52] svbu-mpi:~/mpi % ls -l core* ls: No match. [9:52] svbu-mpi:~/mpi % mpirun -np 4 --bynode --host svbu-mpi055,svbu-mpi056 ./abort Rank 1 sleeping... [svbu-mpi055:03991] *** Process received signal *** [svbu-mpi055:03991] Signal: Aborted (6) [svbu-mpi055:03991] Signal code: (-6) [svbu-mpi055:03991] [ 0] /lib64/libpthread.so.0 [0x2b45caac87c0] [svbu-mpi055:03991] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x2b45cad05265] [svbu-mpi055:03991] [ 2] /lib64/libc.so.6(abort+0x110) [0x2b45cad06d10] [svbu-mpi055:03991] [ 3] ./abort(main+0x36) [0x4008ee] [svbu-mpi055:03991] [ 4] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b45cacf2994] [svbu-mpi055:03991] [ 5] ./abort [0x400809] [svbu-mpi055:03991] *** End of error message *** Rank 3 sleeping... Rank 2 sleeping... -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 3991 on node svbu-mpi055 exited on signal 6 (Aborted). -------------------------------------------------------------------------- [9:52] svbu-mpi:~/mpi % ls -l core* -rw------- 1 jsquyres eng5 26009600 Aug 16 09:52 core.abort-1281977540-3991 [9:52] svbu-mpi:~/mpi % file core.abort-1281977540-3991 core.abort-1281977540-3991: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), SVR4-style, from 'abort' [9:52] svbu-mpi:~/mpi % ----- You can see that all processes die immediately, and I get a corefile from the process that called abort(). On Aug 16, 2010, at 9:25 AM, David Ronis wrote: > I've tried both--as you said, MPI_Abort doesn't drop a core file, but > does kill off the entire MPI job. abort() drops core when I'm running > on 1 processor, but not in a multiprocessor run. In addition, a node > calling abort() doesn't lead to the entire run being killed off. > > David > O > n Mon, 2010-08-16 at 08:51 -0700, Jeff Squyres wrote: >> On Aug 13, 2010, at 12:53 PM, David Ronis wrote: >> >>> I'm using mpirun and the nodes are all on the same machin (a 8 cpu box >>> with an intel i7). coresize is unlimited: >>> >>> ulimit -a >>> core file size (blocks, -c) unlimited >> >> That looks good. >> >> In reviewing the email thread, it's not entirely clear: are you calling >> abort() or MPI_Abort()? MPI_Abort() won't drop a core file. abort() should. >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/