Hello,
I'm using openmpi-1.3a1r18241 on a 2 node configuration and having troubles 
with the ompi-restart.  I can successfully ompi-checkpoint and ompi-restart a 1 
way mpi code.
When I try a 2 way job running across 2 nodes, I get

bash-2.05b$ ompi-restart -verbose ompi_global_snapshot_926.ckpt
[shc005:01159] Checking for the existence of 
(/home/sharon/ompi_global_snapshot_926.ckpt)
[shc005:01159] Restarting from file (ompi_global_snapshot_926.ckpt)
[shc005:01159]   Exec in self
Restart failed: Permission denied
Restart failed: Permission denied


If I try running as root, using the same snapshot file, the code restarts ok, 
but both tasks and up on the same node, rather than one per node (like the 
original mpirun).

I'm using BLCR version 0.6.5.
I generate checkpoints via 'ompi-checkpoint pid'
where pid is the pid of the mpirun task below

mpirun -np 2 -am ft-enable-cr ./xhpl


Thanks very much for any hints you can give on how to resolve either of these 
problems.

Reply via email to