Hello everyone, I'm using OpenMPI 1.4.2 with BLCR 0.8.2 to test checkpointing on 2 nodes but it failed to restart (Segmentation fault). Here are the details concerning my problem:
+ OS: Centos 5.4 + OpenMPI configure: ./configure --with-ft=cr --enable-ft-thread --enable-mpi-threads \ --with-blcr=/home/nguyen/opt/blcr --with-blcr-libdir=/home/nguyen/opt/blcr/lib \ --prefix=/home/nguyen/opt/openmpi \ --enable-mpirun-prefix-by-default + mpirun -am ft-enable-cr -machinefile host ./test I checkpointed the test program using "ompi-checkpoint -v -s PID" and the checkpoint file was created successfully. However it failed to restart using ompi-restart: *"mpirun noticed that process rank 0 with PID 21242 on node rc014.local exited on signal 11 (Segmentation fault)" * Did I miss something in the installation of OpenMPI? Regards, Nguyen Toan