Hello everyone,

I'm using OpenMPI 1.4.2 with BLCR 0.8.2 to test checkpointing on 2 nodes but
it failed to restart (Segmentation fault).
Here are the details concerning my problem:

+ OS: Centos 5.4
+ OpenMPI configure:
./configure --with-ft=cr --enable-ft-thread --enable-mpi-threads \
--with-blcr=/home/nguyen/opt/blcr
--with-blcr-libdir=/home/nguyen/opt/blcr/lib \
--prefix=/home/nguyen/opt/openmpi \
--enable-mpirun-prefix-by-default
+ mpirun -am ft-enable-cr -machinefile host ./test

I checkpointed the test program using "ompi-checkpoint -v -s PID" and the
checkpoint file was created successfully. However it failed to restart using
ompi-restart:
*"mpirun noticed that process rank 0 with PID 21242 on node rc014.local
exited on signal 11 (Segmentation fault)"
*
Did I miss something in the installation of OpenMPI?

Regards,
Nguyen Toan

Reply via email to