I appreciate your help.

Indeed, it's better to create my own mechanism as mentioned Lloyd. Actually
my application is a framework to stream processing (something like IBM
System-S), in which I use Open MPI as communication layer and part of
process management. One of this framework's features is to provide a
dynamic load balance mechanism. In some situations I need to move processes
between machines or temporally suspend their execution. To achieve this, I
need a checkpoint/restart mechanism. It is the reason of my question.

Thanks again.


Rodrigo Silva Oliveira
M.Sc. Student - Computer Science
Universidade Federal de Minas Gerais
www.dcc.ufmg.br/~rsilva <http://www.dcc.ufmg.br/%7Ersilva>




On Thu, Jan 19, 2012 at 1:18 PM, Lloyd Brown <lloyd_br...@byu.edu> wrote:

> Since you're looking for a function call, I'm going to assume that you
> are writing this application, and it's not a pre-compiled, commercial
> application.  Given that, it's going to be significantly better to have
> an internal application checkpointing mechanism, where it serializes and
> stores the data, etc., than to use an external, applicaiton-agnostic
> checkpointing mechanism like BLCR or similar.  The application should be
> aware of what data is important, how to most efficiently store it, etc.
>  A generic library has to assume that everything is important, and store
> it all.
>
> Don't get me wrong.  Libraries like BLCR are great for applications that
> don't have that visibility, and even as a tool for the
> application-internal checkpointing mechanism (where the application
> deliberately interacts with the library to annotate what's important to
> store, and how to do so, etc.).  But if you're writing the application,
> you're better off to handle it internally, than externally.
>
> Lloyd Brown
> Systems Administrator
> Fulton Supercomputing Lab
> Brigham Young University
> http://marylou.byu.edu
>
> On 01/19/2012 08:05 AM, Josh Hursey wrote:
> > Currently Open MPI only supports the checkpointing of the whole
> > application. There has been some work on uncoordinated checkpointing
> > with message logging, though I do not know the state of that work with
> > regards to availability. That work has been undertaken by the University
> > of Tennessee Knoxville, so maybe they can provide more information.
> >
> > -- Josh
> >
> > On Wed, Jan 18, 2012 at 3:24 PM, Rodrigo Oliveira
> > <rsilva.olive...@gmail.com <mailto:rsilva.olive...@gmail.com>> wrote:
> >
> >     Hi,
> >
> >     I'd like to know if there is a way to checkpoint a specific process
> >     running under an mpirun call. In other words, is there a function
> >     CHECKPOINT(rank) in which I can pass the rank of the process I want
> >     to checkpoint? I do not want to checkpoint the entire application,
> >     but just one of its processes.
> >
> >     Thanks
> >
> >     _______________________________________________
> >     users mailing list
> >     us...@open-mpi.org <mailto:us...@open-mpi.org>
> >     http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> >
> >
> > --
> > Joshua Hursey
> > Postdoctoral Research Associate
> > Oak Ridge National Laboratory
> > http://users.nccs.gov/~jjhursey
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Reply via email to