Hi Hugh Just to make sure: You have installed Open-MPI on all your nodes? Same version everywhere?
Jody On Tue, Apr 28, 2009 at 12:57 PM, Hugh Dickinson <h.j.dickin...@durham.ac.uk> wrote: > Hi all, > > First of all let me make it perfectly clear that I'm a complete beginner as > far as MPI is concerned, so this may well be a trivial problem! > > I've tried to set up Open MPI to use SSH to communicate between nodes on a > heterogeneous cluster. I've set up passwordless SSH and it seems to be > working fine. For example by hand I can do: > > ssh nodename uptime > > and it returns the appropriate information for each node. > I then tried running a non-MPI program on all the nodes at the same time: > > mpirun -np 10 --hostfile hostfile uptime > > Where hostfile is a list of the 10 cluster node names with slots=1 after > each one i.e > > nodename1 slots=1 > nodename2 slots=2 > etc... > > Nothing happens! The process just seems to hang. If I interrupt the process > with Ctrl-C I get: > > " > > mpirun: killing job... > > [gamma2.phyastcl.dur.ac.uk:18124] [0,0,0] ORTE_ERROR_LOG: Timeout in file > base/pls_base_orted_cmds.c at line 275 > [gamma2.phyastcl.dur.ac.uk:18124] [0,0,0] ORTE_ERROR_LOG: Timeout in file > pls_rsh_module.c at line 1166 > -------------------------------------------------------------------------- > WARNING: mpirun has exited before it received notification that all > started processes had terminated. You should double check and ensure > that there are no runaway processes still executing. > -------------------------------------------------------------------------- > > " > > If, instead of using the hostfile, I specify on the command line the host > from which I'm running mpirun, e.g.: > > mpirun -np 1 --host nodename uptime > > then it works (i.e. if it doesn't need to communicate with other nodes). Do > I need to tell Open MPI it should be using SSH to communicate? If so, how do > I do this? To be honest I think it's trying to do so, because before I set > up passwordless SSH it challenged me for lots of passwords. > > I'm running Open MPI 1.2.5 installed with Scientific Linux 5.2. Let me > reiterate, it's very likely that I've done something stupid, so all > suggestions are welcome. > > Cheers, > > Hugh > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >