:I've lived through several checkpointing implementations. You've got the easy :part. Applications must participate or such a facility has very limited :usefulness. Delivering a signal is only part of the problem; there tend to :be issues synchronizing user-mode checkpoint of application state with the :kernel's desired to stop the process and squirrel away state. : :There's lots of stuff published about this; check the literature. : : Sam
Well, now it depends heavily on ones goals. There are a huge number of scientific jobs that only need the type of basic checkpointing that you see in, say, linux, which I believe can only handle sbrk() space. Kip has taken it one step further with the file descriptor and mapping save/restore. It's kinda silly to poo-poo the work when the alternative is to have nothing at all. Being able to bite a chunk out of a significant scientific application-set is important. There's an obvious demand for even the very basic checkpointing capability that you see in Linux and I personally believe that it can be done a whole lot better in a BSD environment. The work is also applicable to other things, like debugging. It's a better savecore then savecore, so to speak. With just a tiny bit of work one can checkpoint a running program and then check-restore it into a stopped state and attach GDB to it without interfering with the original process. You get the entire memory space and most of the descriptors *intact*, and you get a live duplicate of the process, making it possible to single step (at least up to a point) even a program that normally could not be checkpointed. I'll take that over the static image you get from a core file any day of the week! In a non-SSI environment there are limits (which have not yet been reached). In an SSI environment, however, which is one of DragonFly's goals, one needs only to add cluster-wide filehandles and a stall/restart capability and the checkpoint code will be able to move the biggest chunk of the process --- it's anonymous memory, to another physical machine, with the rest of pieces trailing behind. That is why the work is so exciting to me. Even if SSI is not one of your goals, the scientific and debugging benefits of the basic capability cannot be denied. You do want to compete a bit more with Linux don't you? Well, this is how it starts. -Matt Matthew Dillon <[EMAIL PROTECTED]> _______________________________________________ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"