Hi Josh, Thank you for your replying. I tried to patch a Ticket #2139 to openmpi-1.4.1 and to install all of the elements from the very beginning. Then I got a correct work. Probably there are some faults on my environment preparation.
# I cannot reproduce the environment when I got failure. # I'm very sorry that I cannot find truly factors of this malfunction # and cannot send any information. # Now I use openmpi-1.4.2, it works well without any patch. (except for ompi_info) >> In addition, when I confirmed open_info output as your demo movie, I got >> "MCA crs: none (MCA v2.0, API v2.0, Component v1.4.1)" (open_info.output) > > This is actually a known bug with ompi_info. I have a fix in the works for > it, and should be available soon. Until then the ticket is linked below: > https://svn.open-mpi.org/trac/ompi/ticket/2097 Thank you, I'll try it. On Wed, May 19, 2010 at 3:46 AM, Josh Hursey <jjhur...@open-mpi.org> wrote: > (Sorry for the delay in replying, more below) > > On Apr 12, 2010, at 6:36 AM, Hideyuki Jitsumoto wrote: > >> Hi Members, >> >> I tried to use checkpoint/restart by openmpi. >> But I can not get collect checkpoint data. >> I prepared execution environment as follows, the strings in () mean >> name of output file which attached on next e-mail ( for mail size >> limitation ): >> >> 1. installed BLCR and checked BLCR is working correctly by "make check" >> 2. executed ./configure with some parameters on openMPI source dir >> (config.output / config.log) >> 3. executed make and make install (make.output.2 / install.output.2) >> 4. confirmed that mca_crs_blcr.[la|so], mca_crs_self.[la|so] on >> /${INSTALL_DIR}/lib/openmpi >> 5. make ~/.openmpi/mca-params.conf (mca-params.conf) >> 6. compiled NPB and executed with -am ft-enable-cr >> 7. invoked ompi-checkpoint <MPIRUN_PID> >> >> As result, I got the message "Checkpoint failed: no processes >> checkpointed." >> (cr_test_cg) > > It is unclear from the output what caused the checkpoint to fail. Can you > turn on some verbose arguments and send me the output? > > Put the following options in you ~/.openmpi/mca-params.conf: > #--------------- > orte_debug_daemons=1 > snapc_full_verbose=20 > crs_base_verbose=10 > opal_cr_verbose=10 > #--------------- > > >> >> In addition, when I confirmed open_info output as your demo movie, I got >> "MCA crs: none (MCA v2.0, API v2.0, Component v1.4.1)" (open_info.output) > > This is actually a known bug with ompi_info. I have a fix in the works for > it, and should be available soon. Until then the ticket is linked below: > https://svn.open-mpi.org/trac/ompi/ticket/2097 > >> >> How should I do for checkpointing ? >> Any guidance in this regard would be highly appreciated. > > Let's see what the verbose output tells us, and go from there. What version > of BLCR are you using? > > -- Josh > >> >> Thank you, >> Hideyuki >> >> -- >> Sincerely Yours, >> Hideyuki Jitsumoto (jitum...@gsic.titech.ac.jp) >> Tokyo Institute of Technology >> Global Scientific Information and Computing center (Matsuoka Lab.) >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Sincerely Yours, Hideyuki Jitsumoto (jitum...@gsic.titech.ac.jp) Tokyo Institute of Technology Global Scientific Information and Computing center (Matsuoka Lab.)