Anton,

I don't know if there usual or typical way of initiating a checkpoint amongst 
various resource managers. I know that the BLCR folks (I believe Eric Roman is 
heading this effort - CC'ed) have been investigating a tighter integration of 
Open MPI, BLCR and Torque. He might be able to give you a bit more guidance on 
this topic.

-- Josh

On Feb 10, 2010, at 11:54 PM, Anton Starikov wrote:

> Hi!
> I'm trying to implement checkpointing on out cluster, and I have obvious 
> question.
> 
> I guess this was implemented many times by other users, so I would like is 
> someone share experience with me.
> 
> With serial/multithreaded jobs it is kind of clear. But for parallel?
> 
> We have "fat" 16-core nodes, so user use both OpenMP and MPI programs.
> 
> Shell I just do perform some checks in my checkpointing script and call 
> ompi-checkpoint if after tests I figure our that there is MPI job?
> 
> What is "usual" way?
> 
> Best,
> 
> Anton
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to