Fernando,

Thank you for your reply.
I tried to patch the file you mentioned, but the output did not change.

>Are you using a shared file system? You need to use a shared file
system for checkpointing with 1.4.1:
What is the shared file system ? do you mean NFS, Lustre and so on ?
(I'm sorry about my ignorance...)

If I use only one node for application, do I need such a shared-file-system ?


On Mon, Apr 12, 2010 at 9:41 PM, Fernando Lemos <fernando...@gmail.com> wrote:
> On Mon, Apr 12, 2010 at 7:36 AM, Hideyuki Jitsumoto
> <jitum...@gsic.titech.ac.jp> wrote:
>> Hi Members,
>>
>> I tried to use checkpoint/restart by openmpi.
>> But I can not get collect checkpoint data.
>> I prepared execution environment as follows, the strings in () mean
>> name of output file which attached on next e-mail ( for mail size
>> limitation ):
>>
>> 1. installed BLCR and checked BLCR is working correctly by "make check"
>> 2. executed ./configure with some parameters on openMPI source dir
>> (config.output / config.log)
>> 3. executed make and make install (make.output.2 / install.output.2)
>> 4. confirmed that mca_crs_blcr.[la|so], mca_crs_self.[la|so] on
>> /${INSTALL_DIR}/lib/openmpi
>> 5. make ~/.openmpi/mca-params.conf (mca-params.conf)
>> 6. compiled NPB and executed with -am ft-enable-cr
>> 7. invoked ompi-checkpoint <MPIRUN_PID>
>>
>> As result, I got the message "Checkpoint failed: no processes checkpointed."
>> (cr_test_cg)
>
> Are you using a shared file system? You need to use a shared file
> system for checkpointing with 1.4.1:
>
> https://svn.open-mpi.org/trac/ompi/ticket/2139
>
> Regards,
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Sincerely Yours,
Hideyuki Jitsumoto (jitum...@gsic.titech.ac.jp)
Tokyo Institute of Technology
Global Scientific Information and Computing center (Matsuoka Lab.)

Reply via email to