So what you are looking for is checkpoint/restart support, which you can find some details about at the link below:
  http://osl.iu.edu/research/ft/ompi-cr/

Additionally, we relatively recently added the ability to checkpoint and 'stop' the application. This generates a usable checkpoint of the application then sends SIGSTOP. The processes can be continued with 'SIGCONT', but they could also be killed (or otherwise removed from the system) and then later restarted from the checkpoint. Some details on this feature are at the link below:
  http://osl.iu.edu/research/ft/ompi-cr/examples.php#uc-ckpt-stop

-- Josh

On Apr 13, 2010, at 10:28 AM, Ralph Castain wrote:

I believe that is called "checkpoint/restart" - see the FAQ page on that subject.

On Apr 13, 2010, at 7:30 AM, Hoelzlwimmer Andreas - S0810595005 wrote:

Hi,

I found in the FAQ that it is possible to suspend/resume MPI jobs. Would it also be possible to Hibernate the jobs (free the memory, serialize it to the hard drive) and continue/wake them up later, possibly at different locations?

cheers,
Andreas

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to