Anton, I don't know if there usual or typical way of initiating a checkpoint amongst various resource managers. I know that the BLCR folks (I believe Eric Roman is heading this effort - CC'ed) have been investigating a tighter integration of Open MPI, BLCR and Torque. He might be able to give you a bit more guidance on this topic.
-- Josh On Feb 10, 2010, at 11:54 PM, Anton Starikov wrote: > Hi! > I'm trying to implement checkpointing on out cluster, and I have obvious > question. > > I guess this was implemented many times by other users, so I would like is > someone share experience with me. > > With serial/multithreaded jobs it is kind of clear. But for parallel? > > We have "fat" 16-core nodes, so user use both OpenMP and MPI programs. > > Shell I just do perform some checks in my checkpointing script and call > ompi-checkpoint if after tests I figure our that there is MPI job? > > What is "usual" way? > > Best, > > Anton > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users