On 10/6/2019 9:23 AM, George Wm Turner wrote:
I stumbled across CRIU (Checkpoint/Restore In Userspace) https://criu.org/Main_Page a couple of weeks ago. I have not utilized it yet it; it's on my ToDo list. They claim that it’s packaged with most distress; I checked RHEL/CentOS and it was there. Be careful of package/kernel versions; i.e a good reason to go with the version included in your distro. BLCR was last updated January 2013; back in the day, it worked well enough for simpler apps; complicated MPI apps was less so.
Thanks, George. I've installed it and started looking at it. At present I am applying it to a Grid Engine job, and have not figured out how to make it restore successfully. (Checkpointing goes all right, but gives a minor warning.) It does seem to require running as root, and of course my file systems are NFS mounted, which leads to issues. (Since I am just running some scratch things for testing, using 777 permissions (ouch!) seems to allow checkpointing to proceed. I do need to understand a bit more of how it works and what flags I need :-) ... It seems it needs root privilege to work, though maybe doing suid to root is enough (I've not tried setting that on the executable). Regards - EM