Hello, I'n trying to get functions like orte-checkpoint, orte-restart,... works but there are some errors that I don't have any clue about.
Blcr (0.8.2) works fine apparently and I have installed openmpi 1.4.2 from source with option blcr. The command mpirun -np 4 -am ft-enable-cr ./checkpoint_test seemed OK but orte-checkpoint --term PID_of_checkpoint_test ( obtaining after ps -ef | grep mpirun ) does not return and shows nothing like errors! Then, I checked with ompi-ps this time, I obtain: oob-tcp: Communication retries exceeded. Can not communicate with peer Does anyone has the same problem? Any idea is welcomed! Thanks, Son. -- --------------------------------------------------------- Son NGUYEN KIM Antibes 06600 Tel: 06 48 28 37 47