i am using openmpi-1.6.1
i need to try checkpoint restart ( self , blcr )
after i installed openmpi i had the following in my installation folder :
bin\ ompi-checkpoint
bin\ompi-restart
lib\openmpi\mca_crs_self.la
lib\openmpi\mca_crs_self.so
lib\openmpi\mca_crs_blcr.la
lib\openmpi\mca_crs_blcr.so
although i have:
ompi_info |
grep FT
FT Checkpoint support: yes
(checkpoint thread: yes)
ompi_info | grep
crs
MCA crs: none (MCA
v2.0, API v2.0, Component v1.6.1)
when i try to use checkpoint it failed:
basma@basma-Satellite-A500:~$ /OpenMP/openmpi-1.6.1/builddir/bin/mpirun -np 3
-am ft-enable-cr /home/basma/NPB3.3/NPB3.3/NPB3.3-OMP/bin/lu.A
NAS Parallel Benchmarks (NPB3.3-OMP) - LU Benchmark
Size: 64x 64x 64
Iterations: 250
Number of available threads: 4
NAS Parallel Benchmarks (NPB3.3-OMP) - LU Benchmark
Size: 64x 64x 64
Iterations: 250
Number of available threads: 4
NAS Parallel Benchmarks (NPB3.3-OMP) - LU Benchmark
Size: 64x 64x 64
Iterations: 250
Number of available threads: 4
Time step 1
Time step 1
Time step 1
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 2917 on node basma-Satellite-A500
exited on signal 10 (User defined signal 1).
--------------------------------------------------------------------------
basma@basma-Satellite-A500:~$
this resulted when i run this command from shell 2 :
basma@basma-Satellite-A500:~$
/OpenMP/openmpi-1.6.1/builddir/bin/ompi-checkpoint 2916
what i did wrong?
thank you