On Feb 25, 2009, at 12:25 PM, Ken Mighell wrote:
We are trying to compile the code with Open MPI on a Mac Pro with 2
quad-core Xeons using gfortran.
The code seem to be running ... for the most part. Unfortunately we
keep getting a segfault
which spits out a variant of the following message:
[oblix:21522] *** Process received signal ***
[oblix:21522] Signal: Segmentation fault (11)
[oblix:21522] Signal code: Address not mapped (1)
[oblix:21522] Failing at address: 0xc0000710
[oblix:21522] [ 0] 2 libSystem.B.dylib 0x92a892bb
_sigtramp + 43
[oblix:21522] [ 1] 3 ??? 0xffffffff
0x0 + 4294967295
[oblix:21522] [ 2] 4 exe.out 0x0001281b
MAIN__ + 4875
[oblix:21522] [ 3] 5 exe.out 0x00013c38
main + 40
[oblix:21522] [ 4] 6 exe.out 0x00001936
start + 54
[oblix:21522] *** End of error message ***
After some researching of the error message, and digging around in
the Open MPI user's mailing list,
it appears that the bug may be in Open MPI.
I'm not sure what you mean by this -- getting a stack trace out of
Open MPI doesn't necessarily mean a bug in Open MPI.
Can you get corefile and look and see what exactly failed? Or run
under a debugger to see where/how exactly the process fails? From the
stack trace above, it looks like the failure occurs in application
code, not Open MPI...?
--
Jeff Squyres
Cisco Systems