On Mar 1, 2006, at 5:26 PM, Xiaoning (David) Yang wrote:
I installed Open MPI 1.0.1 on two Mac G5s (one with two cpus and
the other
with 4 cpus.). I set up ssh on both machines according to the FAQ.
My mpi
jobs work fine if I run the jobs on only one computer. But when I
ran a job
across the two Macs from the first Mac mac1, I got:
mac1: mpirun -np 6 --hostfiles /Users/me/my_hosts hello_world
tcsh: orted: Command not found.
[mac1:01019] ERROR: A daemon on node mac2 failed to start as expected.
[mac1:01019] ERROR: There may be more information available from
[mac1:01019] ERROR: the remote shell (see above).
[mac1:01019] ERROR: The daemon exited unexpectedly with status 1.
2 processes killed (possibly by Open MPI)
File my_hosts looks like
mac1 slots=2
mac2 slots=4
The orted is definitely on my path on both machines. Any idea? Help is
greatly appreciated!
I'm guessing that the issue is with your shell configuration. mpirun
starts the orted on the remote node through rsh/ssh, which will start
a non-login shell on the remote node. Unfortunately, the set of
dotfiles evaluated when a non-login shell is different than when
starting a login shell. The easiest way to tell if this is the issue
is to check whether orted is in your path when started in a non-login
shell with a command like:
ssh remote_host which orted
More information on how to configure your particular shell for use
with Open MPI can be found in our FAQ at:
http://www.open-mpi.org/faq/?category=running
Hope this helps,
Brian
--
Brian Barrett
Open MPI developer
http://www.open-mpi.org/