On Jan 14, 2010, at 2:50 AM, Andreea Costea wrote: > Hei there > > I have some questions regarding checkpoint/restart: > > 1. Until recently I thought that ompi-restart and ompi-restart are used to > checkpoint a process inside an MPI application. Now I reread this and I > realized that actually what it does is to checkpoint the mpirun process. Does > this mean that if I run my application with multiple processes and on > multiple nodes in my network the checkpoint file will contain the states of > all the processes of my MPI application?
I think you slightly misread the entry. ompi-checkpoint checkpoints the entire MPI application, across node boundaries. It requires that the user pass the PID of mpirun to server as a reference point for the command. This way a user can run multiple mpiruns from the same machine and only checkpoint a subset of those. > 2. Can I restart the application on a different node? Yes. If you have trouble doing this, then I would suggest following the directions in the BLCR FAQ entry below (it usually addressed 99% of the problems people have doing this): https://upc-bugs.lbl.gov//blcr/doc/html/FAQ.html#prelink -- Josh > > Thanks a lot, > Andreea > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users