Hi Everyone! I would like to understand how the checkpoint tools work on OpenMPI, like BLCR and DMTCP. I would be glad if you could me answer the following questions:
1) BLCR and DMTCP take checkpoints on the parallel processes. The checkpoints are taken on a coordinated way? I mean, there is a synchronization among the processes in order to reach a consistent global state? 2) If there is a coordinated checkpoint, is the OpenMPI responsible to do that? 3) There is a way to tell to OpenMPI just to take uncoordinated checkpoints, not in a coordinated way? 4) Where can I find a documentation about how to configure these tools on OpenMPI. Thank you very much! Edson