I've been having an intermittent failure during MPI initialization (v 
1.4.3) for several months.  It comes and goes as I make changes to my 
application, that is changes unrelated to MPI calls.  Even when I have a 
version of my app which shows the problem, it doesn't happen on every 
submittal.  This is a representative stack trace:

[mtcompute-6-6:05845] *** Process received signal ***
[mtcompute-6-6:05845] Signal: Segmentation fault (11)
[mtcompute-6-6:05845] Signal code: Address not mapped (1)
[mtcompute-6-6:05845] Failing at address: 0x2ac352e0bd80
[mtcompute-6-6:05845] [ 0] /lib64/libpthread.so.0 [0x314ee0eb10]
[mtcompute-6-6:05845] [ 1] /opt/openmpi/lib/libmpi.so.0 [0x2b2b3d42fa70]
[mtcompute-6-6:05845] [ 2] 
/opt/openmpi/lib/libopen-pal.so.0(opal_progress+0x5a) [0x2b2b3fa694ea]
[mtcompute-6-6:05845] [ 3] /opt/openmpi/lib/libopen-rte.so.0 
[0x2b2b3f80913c]
[mtcompute-6-6:05845] [ 4] /opt/openmpi/lib/libmpi.so.0 [0x2b2b3d3f160c]
[mtcompute-6-6:05845] [ 5] /opt/openmpi/lib/libmpi.so.0(MPI_Init+0xf0) 
[0x2b2b3d40eb00]
[mtcompute-6-6:05845] [ 6] 
/home/cummings/DART/DARTHome/bin/linux/DebrisProp [0x418610]
[mtcompute-6-6:05845] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) 
[0x31df41d994]
[mtcompute-6-6:05845] [ 8] 
/home/cummings/DART/DARTHome/bin/linux/DebrisProp [0x417992]
[mtcompute-6-6:05845] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 5845 on node 
mtcompute-6-6.local exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

Any suggestions would be welcome.

- Jeff Cummings

Reply via email to