(Sorry for the late reply)

On Jun 7, 2010, at 4:48 AM, Nguyen Kim Son wrote:

> Hello,
> 
> I'n trying to get functions like orte-checkpoint, orte-restart,... works but 
> there are some errors that I don't have any clue about.
> 
> Blcr (0.8.2) works fine apparently and  I have installed openmpi 1.4.2 from 
> source with option blcr. 
> The command
> mpirun -np 4  -am ft-enable-cr ./checkpoint_test
> seemed OK but 
> orte-checkpoint --term PID_of_checkpoint_test ( obtaining after ps -ef | grep 
> mpirun )
> does not return and shows nothing like errors!

You mean the PID of 'mpirun', right?

Does it checkpoint correctly without the '--term' argument?

Can you try the v1.5 release candidate to see if you have the same problem?
  http://www.open-mpi.org/software/ompi/v1.5/

What MCA parameters do you have set in your environment?

-- Josh

> 
> Then, I checked with 
> ompi-ps
> this time, I obtain:
> oob-tcp: Communication retries exceeded.  Can not communicate with peer
> 
> Does anyone has the same problem?
> Any idea is welcomed!
> Thanks,
> Son.
> 
> 
> -- 
> ---------------------------------------------------------
> Son NGUYEN KIM          
> Antibes 06600
> Tel: 06 48 28 37 47 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to