LAM/MPI was able to checkpoint/restart an entire MPI job as you mention. Open MPI is now able to checkpoint/restart as well. In the past week I added to the Open MPI trunk a LAM/MPI-like checkpoint/ restart implementation. In Open MPI we revisited many of the design decisions from the LAM/MPI development and improved on them quite a bit. At the moment there is no documentation on how to use it (egg on my face actually). I'm working on developing the documentation, and I will send a note to the users list once it is available.

Cheers,
Josh

On Mar 21, 2007, at 1:18 PM, Thomas Spraggins wrote:

To migrate processes, you need to be able to checkpoint them.  I
believe that LAM-MPI is the only MPI implementation that allows this,
although I have never used LAM-MPI.

Good luck.

Tom Spraggins
t...@virginia.edu

On Mar 21, 2007, at 1:09 PM, Mohammad Huwaidi wrote:

Hello folks,

I am trying to write some fault-tolerance systems with the
following criteria:
1) Recover any software/hardware crashes
2) Dynamically Shrink and grow.
3) Migrate processes among machines.

Does anyone has examples of code? What MPI platform is recommended
to accomplish such requirements?

I am using three MPI platforms and each has it own issues:
1) MPICH2 - good multi-threading support, but bad fault-tolerance
mechanisms.
2) OpenMPI - Does not support multi-threading properly and cannot
have it trap exceptions yet.
3) FT-MPI - Old and does not support multi-threading at all.

Any suggestions?
--

Regards,
Mohammad Huwaidi

We can't resolve problems by using the same kind of thinking we used
when we created them.
                                                --Albert Einstein
<mohammad.vcf>
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

----
Josh Hursey
jjhur...@open-mpi.org
http://www.open-mpi.org/

Reply via email to