So what you are looking for is checkpoint/restart support, which you
can find some details about at the link below:
http://osl.iu.edu/research/ft/ompi-cr/
Additionally, we relatively recently added the ability to checkpoint
and 'stop' the application. This generates a usable checkpoint of the
application then sends SIGSTOP. The processes can be continued with
'SIGCONT', but they could also be killed (or otherwise removed from
the system) and then later restarted from the checkpoint. Some details
on this feature are at the link below:
http://osl.iu.edu/research/ft/ompi-cr/examples.php#uc-ckpt-stop
-- Josh
On Apr 13, 2010, at 10:28 AM, Ralph Castain wrote:
I believe that is called "checkpoint/restart" - see the FAQ page on
that subject.
On Apr 13, 2010, at 7:30 AM, Hoelzlwimmer Andreas - S0810595005 wrote:
Hi,
I found in the FAQ that it is possible to suspend/resume MPI jobs.
Would it also be possible to Hibernate the jobs (free the memory,
serialize it to the hard drive) and continue/wake them up later,
possibly at different locations?
cheers,
Andreas
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users