Re: [OMPI users] "casual" error

2009-03-05 Thread Biagio Lucini
Many thanks for your help, it was not clear to me whether it was opal, my application or the standard C libs that were causing the segfault. It is already good news that the problem is not at the level of OpenMPI, since this would have meant upgrading that library. My first reaction would be to

Re: [OMPI users] "casual" error

2009-03-05 Thread George Bosilca
Absolutely :) The last few entries on the stack are from OPAL (one of the Open MPI libraries) that trap the segfault. Everything else indicates where the segfault happened. What I can tell from this stack trace is the following: the problem started in your function wait_thread which called

[OMPI users] "casual" error

2009-03-05 Thread Biagio Lucini
We have an application that runs for a very long time with 16 processes (the time is order a few months; we do have check points, but this won't be the issue). It has happened twice that it fails with the error message appended below after running undisturbed for 20-25 days. It has happened twi