It has been clearly stated that the official position pushed forward by a 
majority of the Open MPI developer community is that fault tolerance is not 
needed so we (read this as the official version of Open MPI) do not support it.

However, a group of researchers have been working toward a version of Open MPI 
that supports the last fault tolerance proposal submitted for consideration to 
the MPI Forum. You can access it at 
https://bitbucket.org/jjhursey/ompi-ulfm-rts.

  george. 

On Jun 19, 2012, at 09:58 , 陈松 wrote:

> Hi all,
> 
> Can anyone explain me the fault tolerant features in OpenMPI? I've read the 
> FAQs and some papers about this topic listed in open-mpi.org, but still can't 
> figure out when one node of my supercomputer system fails down during 
> computing, what would happen with the fault tolerant mechanism in OpenMPI, 
> and what should we system administrator do after the failure (or before). 
> 
> Can anyone help me? My boss want me to deploy OpenMPI in our system cuz he 
> want the fault tolerant feature.
> 
> Thanks very much.
> 
> 
> 
> ---------------
> CHEN Song
> R&D Department
> National Supercomputer Center in Tianjin
> Binhai New Area, Tianjin, China
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to