(Sorry for the late reply) On Jun 7, 2010, at 4:48 AM, Nguyen Kim Son wrote:
> Hello, > > I'n trying to get functions like orte-checkpoint, orte-restart,... works but > there are some errors that I don't have any clue about. > > Blcr (0.8.2) works fine apparently and I have installed openmpi 1.4.2 from > source with option blcr. > The command > mpirun -np 4 -am ft-enable-cr ./checkpoint_test > seemed OK but > orte-checkpoint --term PID_of_checkpoint_test ( obtaining after ps -ef | grep > mpirun ) > does not return and shows nothing like errors! You mean the PID of 'mpirun', right? Does it checkpoint correctly without the '--term' argument? Can you try the v1.5 release candidate to see if you have the same problem? http://www.open-mpi.org/software/ompi/v1.5/ What MCA parameters do you have set in your environment? -- Josh > > Then, I checked with > ompi-ps > this time, I obtain: > oob-tcp: Communication retries exceeded. Can not communicate with peer > > Does anyone has the same problem? > Any idea is welcomed! > Thanks, > Son. > > > -- > --------------------------------------------------------- > Son NGUYEN KIM > Antibes 06600 > Tel: 06 48 28 37 47 > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users