Hi Everyone!

I would like to understand how the checkpoint tools work on OpenMPI, like
BLCR and DMTCP. I would be glad if you could me answer the following
questions:

1) BLCR and DMTCP take checkpoints on the parallel processes. The
checkpoints are taken on a coordinated way? I mean, there is a
synchronization among the processes in order to reach a consistent global
state?

2) If there is a coordinated checkpoint, is the OpenMPI responsible to do
that?

3) There is a way to tell to OpenMPI just to take uncoordinated
checkpoints, not in a coordinated way?

4) Where can I find a documentation about how to configure these tools on
OpenMPI.

Thank you very much!

Edson

Reply via email to