On Tue, Mar 23, 2010 at 12:55 PM, fengguang tian <ferny...@gmail.com> wrote: > > I use mpirun -np 50 -am ft-enable-cr --mca snapc_base_global_snapshot_dir > --hostfile .mpihostfile xxxx > to store the global checkpoint snapshot into the shared > directory:/mirror,but the problems are still there, > when ompi-checkpoint, the mpirun is still not killed,it is hanging > there.when doing ompi-restart, it shows: > > mpiu@nimbus:/mirror$ ompi-restart ompi_global_snapshot_333.ckpt/ > -------------------------------------------------------------------------- > Error: The filename (ompi_global_snapshot_333.ckpt/) is invalid because > either you have not provided a filename > or provided an invalid filename. > Please see --help for usage. > > --------------------------------------------------------------------------
Have you tried OpenMPI 1.5? I got it to work with 1.5, but not with 1.4 (but then I didn't try 1.4 with a shared filesystem).