Hi Everyone!
I would like to understand how the checkpoint tools work on OpenMPI, like
BLCR and DMTCP. I would be glad if you could me answer the following
questions:
1) BLCR and DMTCP take checkpoints on the parallel processes. The
checkpoints are taken on a coordinated way? I mean, there is a
s
Based on your questions I would suggest you take a look at the
> ULFM-enabled version of Open MPI. You can find it at
> http://fault-tolerance.org/.
>
> George.
>
>
> On Aug 11, 2013, at 15:33 , Edson Tavares de Camargo
> wrote:
>
>> Thanks a lot for your reply, R
On Aug 10, 2013, at 11:07 AM, Edson Tavares de Camargo
> wrote:
>
>> Hi All,
>>
>> I was looking for posts about fault tolerant in MPI and I found the post
>> below:
>>
>> http://www.open-mpi.org/community/lists/users/2012/06/19658.php
>>
>> I am
Hi All,
I was looking for posts about fault tolerant in MPI and I found the post
below:
http://www.open-mpi.org/community/lists/users/2012/06/19658.php
I am trying to understand all work about failures detection present in
open-mpi. So, I began with a simple application, a ring application
(rin