if you want to look at checkpointing, it's worth going back to look at
Condor, because they made it really work. There are a few interesting
issues that you need to get right. You can't make it 50% of the way
there; that's not useful. You have to hit all the bits -- open /tmp
files, sockets, all of it. It's easy to get about 90% of it but the
last bits are a real headache. Nothing that's come along since has
really done the job (although various efforts claim to, you have to
read the fine print).

ron

Reply via email to