I asked this question because checkpointing with to NFS is successful, but checkpointing without a mount filesystem or a shared storage throws this warning&error:
WARNING: Could not preload specified file: File already exists. Fileset: /home/andreea/checkpoints/global/ompi_global_snapshot_7426.ckpt/0 Host: X Will continue attempting to launch the process. filem:rsh: wait_all(): Wait failed (-1) [[62871,0],0] ORTE_ERROR_LOG: Error in file snapc_full_global.c at line 1054 even if I set the mca-parameters like this: snapc_base_store_in_place=0 crs_base_snapshot_dir=/home/andreea/checkpoints/local snapc_base_global_snapshot_dir=/home/andreea/checkpoints/global and the nodes can connect through ssh without a password. Thanks, Andreea On Mon, Feb 8, 2010 at 12:59 PM, Andreea Costea <andre.cos...@gmail.com>wrote: > Hi, > > Let's say I have an MPI application running on several hosts. Is there any > way to checkpoint this application without having a shared storage between > the nodes? > I already took a look at the examples here > http://www.osl.iu.edu/research/ft/ompi-cr/examples.php, but it seems that > in both cases there is a globally mounted file system. > > Thanks, > Andreea > >