Hi Zhang

We have seen little interest in binary level CR over the years, which is the 
primary reason the support has lapsed. The approach just doesn’t scale very 
well. Once the graduate student who wrote it received his degree, there simply 
wasn’t enough user-level interest to motivate the developer members to maintain 
it.

In the interim, we’ve seen considerable interest in application-level CR in its 
place. You might checkout the SCR library from LLNL as an example of what 
people are doing in that space:

https://computation.llnl.gov/project/scr/ 
<https://computation.llnl.gov/project/scr/>

We did have someone (another graduate student) recently work with the community 
to attempt to restore the binary-level CR support, but he didn’t get a chance 
to complete it prior to graduating. So we are removing the leftover code from 
the 2.x release series until someone comes along with enough interest to repair 
it.

Assuming that hasn’t happened before sometime next year, I might take a shot at 
it then - but I won’t have any time to work on it before next spring at the 
earliest, and as I said, it isn’t clear there is a significant user base for 
binary-level CR with the shift to application-level systems.

HTH
Ralph


> On Sep 18, 2015, at 8:14 AM, Dave Love <d.l...@liverpool.ac.uk> wrote:
> 
> "gzzh...@buaa.edu.cn" <gzzh...@buaa.edu.cn> writes:
> 
>> Hi Team 
>> I am trying to use the MPI to do some test and study on the C/R
>> enabled debugging.  Professor Josh Hursey said that the feature never
>> made it into a release so it was only ever available on the trunk,
>> However , since that time the C/R functionality has fallen into
>> disrepair. It is most likely broken in the trunk today. T tried with
>> the current openmpi-master sourcecode, it can be configure, but can't
>> be make successful because bugs still existing according to the log.
>> Is there any possible that the history openmpi-developer code which
>> supports C/R enabled debugging can be download . I appreciate your
>> offer to help us .
> 
> This does seem an important deficiency, and a good reason to stay with
> 1.6 or use mpich.
> 
> However, DMTCP is supposed to be able to checkpoint over TCP and
> Infiniband without any extra support.  I'm intending to try it soon and
> would be interested to know any relevant experience.  There used to be a
> note about not working over IB with some OMPI implementation detail
> (URC?) but I can't find that now, and the web site implies it should
> work.
> 
> See http://dmtcp.sourceforge.net/
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/09/27625.php

Reply via email to