On Jun 29, 2010, at 9:35 PM, 王睿 wrote:

> Thanks for the feedback. More below:
> 
> Is there any MPI implementions which meet the following requirements:
> 
> 1, it doesn't terminate the whole job when a node is dead?
> 
> 2, it allows the spare node to replace the dead node and take over the work 
> of the dead node?
> 
> As far as I know, FT-MPI meets the two requirements, but it hasn't updated 
> since 2004. Open-mpi is said to combine serveral projects including FT-MPI, 
> but so far, it only provides checkpoinr/restart as a way of fault-tolerance. 

I know that the UT people have been working on such things over the past few 
years, but I don't know the current status.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to