Hi all,Can anyone explain me the fault tolerant features in OpenMPI? I've read 
the FAQs and some papers about this topic listed in open-mpi.org, but still 
can't figure out when one node of my supercomputer system fails down during 
computing, what would happen with the fault tolerant mechanism in OpenMPI, and 
what should we system administrator do after the failure (or before). Can 
anyone help me? My boss want me to deploy OpenMPI in our system cuz he want the 
fault tolerant feature.Thanks very much.---------------CHEN SongR&D 
DepartmentNational Supercomputer Center in TianjinBinhai New Area, Tianjin, 
China

Reply via email to