I've been having an intermittent failure during MPI initialization (v 1.4.3) for several months. It comes and goes as I make changes to my application, that is changes unrelated to MPI calls. Even when I have a version of my app which shows the problem, it doesn't happen on every submittal. This is a representative stack trace:
[mtcompute-6-6:05845] *** Process received signal *** [mtcompute-6-6:05845] Signal: Segmentation fault (11) [mtcompute-6-6:05845] Signal code: Address not mapped (1) [mtcompute-6-6:05845] Failing at address: 0x2ac352e0bd80 [mtcompute-6-6:05845] [ 0] /lib64/libpthread.so.0 [0x314ee0eb10] [mtcompute-6-6:05845] [ 1] /opt/openmpi/lib/libmpi.so.0 [0x2b2b3d42fa70] [mtcompute-6-6:05845] [ 2] /opt/openmpi/lib/libopen-pal.so.0(opal_progress+0x5a) [0x2b2b3fa694ea] [mtcompute-6-6:05845] [ 3] /opt/openmpi/lib/libopen-rte.so.0 [0x2b2b3f80913c] [mtcompute-6-6:05845] [ 4] /opt/openmpi/lib/libmpi.so.0 [0x2b2b3d3f160c] [mtcompute-6-6:05845] [ 5] /opt/openmpi/lib/libmpi.so.0(MPI_Init+0xf0) [0x2b2b3d40eb00] [mtcompute-6-6:05845] [ 6] /home/cummings/DART/DARTHome/bin/linux/DebrisProp [0x418610] [mtcompute-6-6:05845] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) [0x31df41d994] [mtcompute-6-6:05845] [ 8] /home/cummings/DART/DARTHome/bin/linux/DebrisProp [0x417992] [mtcompute-6-6:05845] *** End of error message *** -------------------------------------------------------------------------- mpirun noticed that process rank 1 with PID 5845 on node mtcompute-6-6.local exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- Any suggestions would be welcome. - Jeff Cummings