The problem is that you misspelled the mca param - it should be:
-mca plm_rsh_agent rsh
On Jun 11, 2009, at 10:34 AM, Gleb Crazy Sage Igumnov wrote:
Hello. I've got following problem: I'm trying to restart parallel job
over our cluster using following command line:
/common/openmpi-1.3.2/ompi-restart -mca plm-rsh-agent rsh -verbose
-hostfile hfile ompi_global_snapshot_25229.ckpt
despite of using such mca option I got following error message:
--------------------------------------------------------------------------
[umu2:26112] Checking for the existence of (/home/s0032/
ompi_global_snapshot_25229.ckpt)
[umu2:26112] Restarting from file (ompi_global_snapshot_25229.ckpt)
[umu2:26112] Exec in self
ssh: connect to host umu3 port 22: Connection refused
--------------------------------------------------------------------------
A daemon (pid 26113) died unexpectedly with status 1 while attempting
to launch so we are aborting.
There may be more information reported by the environment (see above).
This may be because the daemon was unable to find all the needed
shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to
have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
mpirun: clean termination accomplished
--------------------------------------------------------------------------
What can I do to make ompi-restart use rsh instead of ssh?
--
With best regards,
Gleb "Crazy Sage" Igumnov mailto:crazy.s...@gmail.com
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users