On Mon, Apr 12, 2010 at 7:36 AM, Hideyuki Jitsumoto
<jitum...@gsic.titech.ac.jp> wrote:
> Hi Members,
>
> I tried to use checkpoint/restart by openmpi.
> But I can not get collect checkpoint data.
> I prepared execution environment as follows, the strings in () mean
> name of output file which attached on next e-mail ( for mail size
> limitation ):
>
> 1. installed BLCR and checked BLCR is working correctly by "make check"
> 2. executed ./configure with some parameters on openMPI source dir
> (config.output / config.log)
> 3. executed make and make install (make.output.2 / install.output.2)
> 4. confirmed that mca_crs_blcr.[la|so], mca_crs_self.[la|so] on
> /${INSTALL_DIR}/lib/openmpi
> 5. make ~/.openmpi/mca-params.conf (mca-params.conf)
> 6. compiled NPB and executed with -am ft-enable-cr
> 7. invoked ompi-checkpoint <MPIRUN_PID>
>
> As result, I got the message "Checkpoint failed: no processes checkpointed."
> (cr_test_cg)

Are you using a shared file system? You need to use a shared file
system for checkpointing with 1.4.1:

https://svn.open-mpi.org/trac/ompi/ticket/2139

Regards,

Reply via email to