On Tue, Dec 14, 2021 at 1:17 AM Andy Jacobson <a...@yovo.org> wrote: > > Those are good points, Duncan. I am experimenting with a nice checkpointing > tool called DMTCP. It operates on the system level but is quite OS-dependent. > It can be found at http://dmtcp.sourceforge.net/index.html. > > Still, it would be nice to be able to checkpoint calls within R to > potentially long-running processes like optim().
Teasing idea. Imagine if we could come up with some de-facto standard API for this and that such a framework could be called automatically by R. Something similar to how user interrupts are checked (e.g. R_CheckUserInterrupt()) on a regular basis by the R engine and through-out the R code. That could help troubleshooting and debugging, e.g. sending the checkpoint to someone else or going backwards in time. Pasting in the below since I failed to hit Reply *All* the other day, and it was only Richard who got it: A few weeks ago, I played around with DMTCP (Distributed MultiThreaded CheckPointing ) for Linux (https://github.com/dmtcp/dmtcp). I'm sharing in case someone is interested in investigating this further. Also, somewhere on the DMTCP wiki, they asked for testing with R by more experienced users. "DMTCP is a tool to transparently checkpoint the state of multiple simultaneous applications, including multi-threaded and distributed applications. It operates directly on the user binary executable, without any Linux kernel modules or other kernel modifications." They seem to be able to run this with HPC jobs, open files, Linux containers, and even MPI, and so on. I've only tested it very quickly with interactive R and it seems to work. Obviously more testing needs to be done to identify when it doesn't work. For example, I'd have a hard time it would work out of the box with local parallel PSOCK workers. They mention "plug-ins", so maybe there's a way to adding support for specific use cases on a one by one. Different academic HPC environment appear to use it, e.g. * https://docs.nersc.gov/development/checkpoint-restart/dmtcp/ * http://wiki.orc.gmu.edu/mkdocs/Creating_Checkpoints_%28DMTCP%29/ * https://wiki.york.ac.uk/display/RCS/VK21%29+Checkpointing+with+DMTCP That's all I have time for now, Henrik > > -Andy > > On 12/13/21 11:51 AM, Duncan Murdoch wrote: > > On 13/12/2021 12:58 p.m., Greg Minshall wrote: > >> Jeff, > >> > >>> This sounds like an OS feature, not an R feature... certainly not a > >>> portable R feature. > >> > >> i'm not arguing for it, but this seems to me like something that could > >> be a language feature. > >> > > > > R functions can call libraries written in other languages, and can start > > processes, etc. R doesn't know everything going on in every function call, > > and would have a lot of trouble saving it. > > > > If you added some limitations, e.g. a process that periodically has its > > entire state stored in R variables, then it would be a lot easier. > > > > Duncan Murdoch > > -- > Andy Jacobson > a...@yovo.org > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.