On Mar 1, 2006, at 5:26 PM, Xiaoning (David) Yang wrote:

I installed Open MPI 1.0.1 on two Mac G5s (one with two cpus and the other with 4 cpus.). I set up ssh on both machines according to the FAQ. My mpi jobs work fine if I run the jobs on only one computer. But when I ran a job
across the two Macs from the first Mac mac1, I got:

mac1: mpirun -np 6 --hostfiles /Users/me/my_hosts hello_world
tcsh: orted: Command not found.
[mac1:01019] ERROR: A daemon on node mac2 failed to start as expected.
[mac1:01019] ERROR: There may be more information available from
[mac1:01019] ERROR: the remote shell (see above).
[mac1:01019] ERROR: The daemon exited unexpectedly with status 1.
2 processes killed (possibly by Open MPI)

File my_hosts looks like

mac1 slots=2
mac2 slots=4

The orted is definitely on my path on both machines. Any idea? Help is
greatly appreciated!

I'm guessing that the issue is with your shell configuration. mpirun starts the orted on the remote node through rsh/ssh, which will start a non-login shell on the remote node. Unfortunately, the set of dotfiles evaluated when a non-login shell is different than when starting a login shell. The easiest way to tell if this is the issue is to check whether orted is in your path when started in a non-login shell with a command like:

  ssh remote_host which orted

More information on how to configure your particular shell for use with Open MPI can be found in our FAQ at:

  http://www.open-mpi.org/faq/?category=running


Hope this helps,

Brian

--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/


Reply via email to