I appreciate your help. Indeed, it's better to create my own mechanism as mentioned Lloyd. Actually my application is a framework to stream processing (something like IBM System-S), in which I use Open MPI as communication layer and part of process management. One of this framework's features is to provide a dynamic load balance mechanism. In some situations I need to move processes between machines or temporally suspend their execution. To achieve this, I need a checkpoint/restart mechanism. It is the reason of my question.
Thanks again. Rodrigo Silva Oliveira M.Sc. Student - Computer Science Universidade Federal de Minas Gerais www.dcc.ufmg.br/~rsilva <http://www.dcc.ufmg.br/%7Ersilva> On Thu, Jan 19, 2012 at 1:18 PM, Lloyd Brown <lloyd_br...@byu.edu> wrote: > Since you're looking for a function call, I'm going to assume that you > are writing this application, and it's not a pre-compiled, commercial > application. Given that, it's going to be significantly better to have > an internal application checkpointing mechanism, where it serializes and > stores the data, etc., than to use an external, applicaiton-agnostic > checkpointing mechanism like BLCR or similar. The application should be > aware of what data is important, how to most efficiently store it, etc. > A generic library has to assume that everything is important, and store > it all. > > Don't get me wrong. Libraries like BLCR are great for applications that > don't have that visibility, and even as a tool for the > application-internal checkpointing mechanism (where the application > deliberately interacts with the library to annotate what's important to > store, and how to do so, etc.). But if you're writing the application, > you're better off to handle it internally, than externally. > > Lloyd Brown > Systems Administrator > Fulton Supercomputing Lab > Brigham Young University > http://marylou.byu.edu > > On 01/19/2012 08:05 AM, Josh Hursey wrote: > > Currently Open MPI only supports the checkpointing of the whole > > application. There has been some work on uncoordinated checkpointing > > with message logging, though I do not know the state of that work with > > regards to availability. That work has been undertaken by the University > > of Tennessee Knoxville, so maybe they can provide more information. > > > > -- Josh > > > > On Wed, Jan 18, 2012 at 3:24 PM, Rodrigo Oliveira > > <rsilva.olive...@gmail.com <mailto:rsilva.olive...@gmail.com>> wrote: > > > > Hi, > > > > I'd like to know if there is a way to checkpoint a specific process > > running under an mpirun call. In other words, is there a function > > CHECKPOINT(rank) in which I can pass the rank of the process I want > > to checkpoint? I do not want to checkpoint the entire application, > > but just one of its processes. > > > > Thanks > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org <mailto:us...@open-mpi.org> > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > > > > > -- > > Joshua Hursey > > Postdoctoral Research Associate > > Oak Ridge National Laboratory > > http://users.nccs.gov/~jjhursey > > > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >