[OMPI users] Question about checkpoint tools on OpenMPI

2015-10-06 Thread Edson Tavares de Camargo
Hi Everyone! I would like to understand how the checkpoint tools work on OpenMPI, like BLCR and DMTCP. I would be glad if you could me answer the following questions: 1) BLCR and DMTCP take checkpoints on the parallel processes. The checkpoints are taken on a coordinated way? I mean, there is a s

Re: [OMPI users] Fault Tolerant Features in OpenMPI

2013-08-12 Thread Edson Tavares de Camargo
Based on your questions I would suggest you take a look at the > ULFM-enabled version of Open MPI. You can find it at > http://fault-tolerance.org/. > > George. > > > On Aug 11, 2013, at 15:33 , Edson Tavares de Camargo > wrote: > >> Thanks a lot for your reply, R

Re: [OMPI users] Fault Tolerant Features in OpenMPI

2013-08-11 Thread Edson Tavares de Camargo
On Aug 10, 2013, at 11:07 AM, Edson Tavares de Camargo > wrote: > >> Hi All, >> >> I was looking for posts about fault tolerant in MPI and I found the post >> below: >> >> http://www.open-mpi.org/community/lists/users/2012/06/19658.php >> >> I am

[OMPI users] Fault Tolerant Features in OpenMPI

2013-08-10 Thread Edson Tavares de Camargo
Hi All, I was looking for posts about fault tolerant in MPI and I found the post below: http://www.open-mpi.org/community/lists/users/2012/06/19658.php I am trying to understand all work about failures detection present in open-mpi. So, I began with a simple application, a ring application (rin