18 14:35:07 自动保存草稿

Josh Hursey Wed, 20 Jun 2012 09:37:56 -0400

You are correct that the Open MPI project combined the efforts of a
few preexisting MPI implementations towards building a single,
extensible MPI implementation with the best features of the prior MPI
implementations. From the beginning of the project the Open MPI
developer community has desired to provide a solid MPI 2 (soon MPI 3)
compliant MPI implementation. Features outside of the MPI standard,
such as fault tolerance, have been (and are) goals as well.

The fault tolerance efforts in Open MPI have been mostly pursued by
the research side of the community. As such, maintenance support for
these features is often challenging and a point of frequent discussion
in the core developer community. There are users for each of these
fault tolerance features/techniques, so they are important to provide.
Integrating these features into Open MPI without diminishing
performance, scalability, and usability is often a delicate software
engineering challenge. Per the prior comments on this thread, it can
often lead to heated debate. :)

In the Open MPI trunk and 1.6 release series there are a few fault
tolerance features that you might be interested in, all with various
degrees of functionality and support. Each of these features are
advancements on the fault tolerance features from the LAM/MPI,
MPICH-V, FT-MPI, and LA-MPI projects.

Checkpoint/Restart support allows a user to manually (via a command
line tool) checkpoint and restart an MPI application, migrate
processes in the machine, and/or ask Open MPI to automatically restart
failed processes on spare resources. Additionally, the application can
use APIs to checkpoint/restart/migrate processes without using the
command line tools. This C/R technique is similar to the feature
provided by LAM/MPI, and was developed by Indiana University (for my
PhD work). For more details see the link below:
  http://www.open-mpi.org/faq/?category=ft#cr-support

Message logging support was added a while back by UTK, but I am
uncertain about its current state. This technique is similar to the
features provided by the MPICH-V project. For more details, I think
the wiki page below describes the functionality:
  https://svn.open-mpi.org/trac/ompi/wiki/EventLog_CR

The MPI Forum standardization body's Fault Tolerance Working Group has
a proposal for application managed fault tolerance. In essence this is
similar to the FT-MPI work, although the interface is quite a bit
different. This feature is not yet in the Open MPI trunk, but you can
find a beta release and more information at the link below:
  http://www.open-mpi.org/~jjhursey/projects/ft-open-mpi/

End-to-end data reliability worked at one point in time, but I do not
know if it is being maintained. This is similar to the fault tolerance
features found in LA-MPI. For information about that project see the
link below:
  http://www.open-mpi.org/faq/?category=ft#dr-support

There are also research projects that are exploring other fault
tolerance techniques above MPI, such as peer based checkpointing and
replication. So far, these projects have tried to stay above the MPI
layer for portability, and have not requested any specific extensions
of Open MPI (maybe with the exception of the work in the MPI Forum,
cited above). Below are links to two such projects, though there are
many others out there:
  http://sourceforge.net/projects/scalablecr/
  http://prod.sandia.gov/techlib/access-control.cgi/2011/112488.pdf

So that should give you an overview of the current state of fault
tolerance techniques in Open MPI. To your question about what you can
expect if a process crashes in your Open MPI job. By default, Open MPI
will kill your entire MPI job and the user will have to restart the
job from either the beginning of execution or from any checkpoint
files that the application has written. Open MPI defaults to killing
the entire MPI job since that is what is often expected by MPI
applications, as most use the default MPI error handler
MPI_ERRORS_ARE_FATAL:
  http://www.netlib.org/utk/papers/mpi-book/node177.html

Last I checked, the current Open MPI trunk will terminate the entire
job even if the user set MPI_ERRORS_RETURN on their communicators. A
reason for this is that the behavior of MPI after returning such an
error is undefined. The MPI Forum Fault Tolerance working group is
working to define this behavior. So if this is of interest see the MPI
Forum work cited above.

If you want a fault tolerance feature, such as automatic
checkpoint/restart recovery, you will need to create a build of Open
MPI with that feature enabled. There are instructions on the various
links above about how to do so.

If you are particularly interested in one feature or have a strong use
case for a set of features, then that is important information for the
Open MPI developer community. This will help use as a project
prioritize the maintenance of various features in the Open MPI
project.

Best of luck,
Josh

On Wed, Jun 20, 2012 at 2:59 AM, 陈松 <chens...@nscc-tj.gov.cn> wrote:
> As far as I know, OMPI combines the fault tolerant features in FT-MPI,
> LA-MPI and LAM/MPI, is this statement still correct now? Or as you say, OMPI
> supports checkpoint/restart(like in LAM/MPI) only? I don't know the details
> of FT-MPI or LA-MPI, aren't they useful or necesarry?
>
> In fact, what I really want to know is, suppose I run a job on N processors
> with OMPI, and one (or some) of these processors crashes, then what would be
> done by the fault-tolerant mechanism of OMPI? Meanwhile what should the
> sys-admin do(like restart the crashed node) ?
>
> In my understanding, after the crash, the sys-admin should restart the
> crashed node(if it can be restarted), and then do the rollback by some sort
> of command, while the OMPI would help hang up all the computing process,
> waiting for rollback command, is this correct?
>
> thanks again.
>
>
>
> --------- 原始邮件信息 ---------
> 发件人: "Open MPI Users" <us...@open-mpi.org>
> 收件人: "Open MPI Users" <us...@open-mpi.org>
> 主题: Re: [OMPI users] 2012/06/18 14:35:07 自动保存草稿
> 日期: 2012/06/20 01:26:08, Wednesday
>
>
> That's a little bit strong - OMPI still supports checkpoint/restart as a
> fault tolerance mechanism. There really isn't anything the sys admin has to
> do, though - what is required is that users periodically order their
> programs to checkpoint so they can be restarted after a failure.
>
> Checkpointing is typically done either by the app itself (say, when it
> reaches some point it feels is a good one to save), or using a script that
> just orders a checkpoint every so many seconds.
>
> What we have said is that we don't believe the FT "run thru failure"
> position pushed by UTK is particularly required at this time. Partly a
> question of impact vs benefit, mostly due to competing approaches offering
> equivalent fault recovery capability with less impact. But that's a separate
> discussion.
>
>
> On Jun 19, 2012, at 11:16 AM, George Bosilca wrote:
>
> It has been clearly stated that the official position pushed forward by a
> majority of the Open MPI developer community is that fault tolerance is not
> needed so we (read this as the official version of Open MPI) do not support
> it.
>
> However, a group of researchers have been working toward a version of Open
> MPI that supports the last fault tolerance proposal submitted for
> consideration to the MPI Forum. You can access it
> at https://bitbucket.org/jjhursey/ompi-ulfm-rts.
>
>   george.
>
> On Jun 19, 2012, at 09:58 , 陈松 wrote:
>
> Hi all,
>
> Can anyone explain me the fault tolerant features in OpenMPI? I've read the
> FAQs and some papers about this topic listed in open-mpi.org, but still
> can't figure out when one node of my supercomputer system fails down during
> computing, what would happen with the fault tolerant mechanism in OpenMPI,
> and what should we system administrator do after the failure (or before).
>
> Can anyone help me? My boss want me to deploy OpenMPI in our system cuz he
> want the fault tolerant feature.
>
> Thanks very much.
>
>
>
> ---------------
> CHEN Song
> R&D Department
> National Supercomputer Center in Tianjin
> Binhai New Area, Tianjin, China
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
>
> ________________________________
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey

[OMPI users] Re: [OMPI users] 回复: Re: [OMPI users] 2012/06/18 14:35:07 自动保存草稿

Reply via email to