I have created the shared file system. but I created a /mirror at root
directory,not at the $HOME directory,is that the
problem? thank you

cheers
fengguang

On Tue, Mar 23, 2010 at 10:23 AM, Fernando Lemos <fernando...@gmail.com>wrote:

> On Mon, Mar 22, 2010 at 8:20 PM, fengguang tian <ferny...@gmail.com>
> wrote:
> > I set up a cluster of 18 nodes using Open MPI and BLCR library, and the
> MPI
> > program runs well on the clusters,
> > but how to checkpoint the MPI program on this clusters?
> > for example:
> > here is what I do for a test:
> > mpiu@nimbus: /mirror$ mpirun -np 50 --hostfile .mpihostfile -am
> ft-enable-cr
> > hellompi
> >  the program will run on the clusters
> > then ,I enter:
> > mpiu@nimbus: /mirror$ ompi-checkpoint -term $(pidof mpirun)
> >
> > but the MPI program are not terminated as what happaned on single
> > machine,although it created a checkpoint file“ompi_global_snapshot_
> > 14030.ckpt“ in the home directory on master node.
>
> Are you using OpenMPI 1.4 without a shared file system mounted at
> $HOME? If yes, then take a look here:
>
> http://www.open-mpi.org/community/lists/users/2010/03/12246.php
>
> Regards,
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Reply via email to