dear sir

i am sending the details as follows


1. i am using openmpi-1.3.3 and blcr 0.8.2 
2. i have installed blcr 0.8.2 first under /root/MS
3. then i installed openmpi 1.3.3 under /root/MS
4 i have configured and installed open mpi as follows

#./configure --with-ft=cr --enable-mpi-threads --with-blcr=/usr/local/bin 
--with-blcr-libdir=/usr/local/lib
# make 
# make install

then i added the following to the .bash_profile under home directory( i went to 
home directory by doing cd ~)

/sbin/insmod /usr/local/lib/blcr/2.6.23.1-42.fc8/blcr_imports.ko 
/sbin/insmod /usr/local/lib/blcr/2.6.23.1-42.fc8/blcr.ko 
PATH=$PATH:/usr/local/bin
MANPATH=$MANPATH:/usr/local/man
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib

then i compiled and run the file arr_add.c as follows

[root@localhost examples]# mpicc -o res arr_add.c
[root@localhost examples]# mpirun -np 2 -am ft-enable-cr ./res

2       2       2       2       2       2       2       2       2       2
2       2       2       2       2       2       2       2       2       2
2       2       2       2       2       2       2       2       2       2
--------------------------------------------------------------------------
Error: The process with PID 5790 is not checkpointable.
       This could be due to one of the following:
        - An application with this PID doesn't currently exist
        - The application with this PID isn't checkpointable
        - The application with this PID isn't an OPAL application.
       We were looking for the named files:
         /tmp/opal_cr_prog_write.5790
         /tmp/opal_cr_prog_read.5790
--------------------------------------------------------------------------
[localhost.localdomain:05788] local) Error: Unable to initiate the handshake 
with peer [[7788,1],1]. -1
[localhost.localdomain:05788] [[7788,0],0] ORTE_ERROR_LOG: Error in file 
snapc_full_global.c at line 567
[localhost.localdomain:05788] [[7788,0],0] ORTE_ERROR_LOG: Error in file 
snapc_full_global.c at line 1054
2       2       2       2       2       2       2       2       2       2
2       2       2       2       2       2       2       2       2       2
2       2       2       2       2       2       2       2       2       2
2       2       2       2       2       2       2       2       2       2
2       2       2       2       2       2       2       2       2       2
2       2       2       2       2       2       2       2       2       2


NOTE: the PID of mpirun is 5788

i geve the following command for taking the checkpoint

[root@localhost examples]#ompi-checkpoint -s 5788

i got the following output , but it was hanging like this

[localhost.localdomain:05796]                 Requested - Global Snapshot 
Reference: (null)
[localhost.localdomain:05796]                   Pending - Global Snapshot 
Reference: (null)
[localhost.localdomain:05796]                   Running - Global Snapshot 
Reference: (null)


can anybody resolve this problem
kindly rectify it.


with regards

mallikarjuna shastry




Reply via email to