Fernando, Thank you for your reply. I tried to patch the file you mentioned, but the output did not change.
>Are you using a shared file system? You need to use a shared file system for checkpointing with 1.4.1: What is the shared file system ? do you mean NFS, Lustre and so on ? (I'm sorry about my ignorance...) If I use only one node for application, do I need such a shared-file-system ? On Mon, Apr 12, 2010 at 9:41 PM, Fernando Lemos <fernando...@gmail.com> wrote: > On Mon, Apr 12, 2010 at 7:36 AM, Hideyuki Jitsumoto > <jitum...@gsic.titech.ac.jp> wrote: >> Hi Members, >> >> I tried to use checkpoint/restart by openmpi. >> But I can not get collect checkpoint data. >> I prepared execution environment as follows, the strings in () mean >> name of output file which attached on next e-mail ( for mail size >> limitation ): >> >> 1. installed BLCR and checked BLCR is working correctly by "make check" >> 2. executed ./configure with some parameters on openMPI source dir >> (config.output / config.log) >> 3. executed make and make install (make.output.2 / install.output.2) >> 4. confirmed that mca_crs_blcr.[la|so], mca_crs_self.[la|so] on >> /${INSTALL_DIR}/lib/openmpi >> 5. make ~/.openmpi/mca-params.conf (mca-params.conf) >> 6. compiled NPB and executed with -am ft-enable-cr >> 7. invoked ompi-checkpoint <MPIRUN_PID> >> >> As result, I got the message "Checkpoint failed: no processes checkpointed." >> (cr_test_cg) > > Are you using a shared file system? You need to use a shared file > system for checkpointing with 1.4.1: > > https://svn.open-mpi.org/trac/ompi/ticket/2139 > > Regards, > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Sincerely Yours, Hideyuki Jitsumoto (jitum...@gsic.titech.ac.jp) Tokyo Institute of Technology Global Scientific Information and Computing center (Matsuoka Lab.)