Did you try to follow the advice on the LAPACK mailing list, i.e. upgrade your compiler from the MAC OS X default (4.0.1) to 4.3.0 ?

Btw, what is the test you're running? Can you create a small test case so I can try to reproduce it?

Thanks,
  george.

On Jun 11, 2009, at 17:02 , Nick Collier wrote:

Hi,

I'm developing under OSX 10.5.7 with Open-MPI 1.3.2 and am running into intermittent corruption when send / recv user defined data type. When running with less than four processes (i.e. mpirun -np [2,3]), the data is fine, when running with 4 or more the received data is intermittently corrupted. By corrupted, I mean things like what should be small integer values in a struct are very large as if the memory hasn't been assigned properly. This occurs intermittently -- some runs will be fine and others won't be, leading to crashes like:

[belafonte:30191] *** Process received signal ***
[belafonte:30191] Signal: Bus error (10)
[belafonte:30191] Signal code:  (2)
[belafonte:30191] Failing at address: 0x9
[belafonte:30191] [ 0] 2 libSystem.B.dylib 0x945af2bb _sigtramp + 43 [belafonte:30191] [ 1] 3 ??? 0xffffffff 0x0 + 4294967295

I'm not sure how to proceed or what might be wrong. The closest thing I could find on google was http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=2&t=614 where someone reports having issues with ScaLapack in combination with openmpi and OSX's stock gcc 4.01 that were fixed by using gcc 4.3.1.

At any rate, any suggestions on how to move forward would be appreciated.

thanks,

Nick
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to