Hi!
Berkely recently released a new version of their BLCR. They already
marked the function cr_request_file as deprecated in BLCR 0.7.3. Now
they removed deprecated functions from libcr API.
Since checkpointing support of OMPI is using cr_request_file, all
checkpointing operations fail with BLCR
Hi!
I'll work on a patch, and let you know when it is ready. Unfortunately
it probably won't be for a couple weeks. :(
Ok, thanks a lot for letting me know. In three weeks we'll
have a booth at ICT
(http://ec.europa.eu/information_society/events/ict/2008)
where we plan to showcase fault tolera
Hi Tim!
First of all: thanks a lot for answering! :-)
Could you try running your two MPI jobs with fewer procs each,
say 2 or 3 each instead of 4, so that there are a few extra cores available.
This problem occurrs with any number of procs.
Also, what happens to the checkpointing of one MP
Hi!
I'm using the development version of OMPI from SVN (rev. 19857)
for executing MPI jobs on my cluster system. I'm particularly using
the checkpoint and restart feature, basing on the currentmost version
of BLCR.
The checkpointing is working pretty fine as long as I only execute
a single job o
Hi Josh!
I believe this is now fixed in the trunk. I was able to reproduce
with the current trunk and committed a fix a few minutes ago in
r19601. So the fix should be in tonight's tarball (or you can grab it
from SVN). I've made a request to have the patch applied to v1.3, but
that may take a d
Hi Josh!
First of all, thanks a lot for replying. :-)
When executing this checkpoint command, the running application
directly aborts, even though I did not specify the "--term" option:
--
mpirun noticed that process ran
Hi!
Hi, I have installed openmpi-1.2.7 with following instructions:
./configure --with-ft=cr --enable-ft-enable-thread --enable-mpi-thread
--with-blcr=$HOME/blcr --prefix=$HOME/openmpi
make all install
In directory bin of directory $HOME/openmpi there is not ompi-checkpoint and
ompi-restart.
A
Hi!
Since I am interested in fault tolerance, checkpointing and
restart of OMPI is an intersting feature for me. So I installed
BLCR 0.7.3 as well as OMPI from SVN (rev. 19553). For OMPI
I followed the instructions in the "Fault Tolerance Guide"
in the OMPI wiki:
./autogen.sh
./configure --with-
Hi Gabriele!
In this case, mpirun works well, but the checkpoint procedure fails:
ompi-checkpoint 20109
[node0316:20134] Error: Unable to get the current working directory
[node0316:20134] [[42404,0],0] ORTE_ERROR_LOG: Not found in file
orte-checkpoint.c at line 395
[node0316:20134] HNP with PI
Hi!
I'm working in a project fucusing on fault tolerance in
Grid systems. We have been using LAM-MPI and BLCR so far,
now I want to evaluate OpenMPI.
I read in the mailing lists that the current stable version
of OpenMPI lacks many features on checkpointing, making it
recommended to use the deve
10 matches
Mail list logo