Matthias,
I think that the patch attached to the ticket below should address
your issue:
https://svn.open-mpi.org/trac/ompi/ticket/1619
I was able to reproduce this problem fairly reliably with a particular
benchmark, on a particular configuration and very frequent
checkpoints. With this patch I was not able to reproduce the problem,
so I think this fixes the problem.
In the process of tracking this bug, I believe that there is a problem
with the way the checkpoint/restart coordination component handles
MPI_ANY_SOURCE and MPI_ANY_TAG. I'll pursue a fix for these cases, but
it will be much more involved than the one currently attached to the
ticket.
Let me know if this patch fixes the problem that you are seeing.
Thank you for your patience and the bug report,
Josh
On Oct 31, 2008, at 9:49 AM, Matthias Hovestadt wrote:
Hi!
I'll work on a patch, and let you know when it is ready.
Unfortunately it probably won't be for a couple weeks. :(
Ok, thanks a lot for letting me know. In three weeks we'll
have a booth at ICT
(http://ec.europa.eu/information_society/events/ict/2008)
where we plan to showcase fault tolerance mechanisms, having
OMPI as major checkpointing component. I think I will use the
time until ICT for finding a workaround for this issue... :-)
Best,
Matthias
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users