On Jan 14, 2010, at 2:50 AM, Andreea Costea wrote:

> Hei there
> 
> I have some questions regarding checkpoint/restart:
> 
> 1. Until recently I thought that ompi-restart and ompi-restart are used to 
> checkpoint a process inside an MPI application. Now I reread this and I 
> realized that actually what it does is to checkpoint the mpirun process. Does 
> this mean that if I run my application with multiple processes and on 
> multiple nodes in my network the checkpoint file will contain the states of 
> all the processes of my MPI application?

I think you slightly misread the entry. ompi-checkpoint checkpoints the entire 
MPI application, across node boundaries. It requires that the user pass the PID 
of mpirun to server as a reference point for the command. This way a user can 
run multiple mpiruns from the same machine and only checkpoint a subset of 
those.

> 2. Can I restart the application on a different node? 

Yes. If you have trouble doing this, then I would suggest following the 
directions in the BLCR FAQ entry below (it usually addressed 99% of the 
problems people have doing this):
  https://upc-bugs.lbl.gov//blcr/doc/html/FAQ.html#prelink

-- Josh

> 
> Thanks a lot,
> Andreea
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to