Hi Hugh You're right, there is no initialization command (like lamboot) you have to call.
I don't really know why your sewtup doesn't work, so i'm making some more "blind shots" can you do passwordless ssh from between any two of your nodes? does mpirun -np 1 --host nodenameX uptime work for every X when called from any of your nodes? Have you tried mpirun -np 2 --host nodename1,nodename2 uptime (i.e. not using the host file) Jody On Tue, Apr 28, 2009 at 4:37 PM, Hugh Dickinson <h.j.dickin...@durham.ac.uk> wrote: > Hi Jody, > > The node names are exactly the same. I wanted to avoid updating the version > because I'm not the system administrator, and it could take some time before > it gets done. If it's likely to fix the problem though I'll try it. I'm > assuming that I don't have to do something analogous to the old "lamboot" > command to initialise Open MPI on all the nodes. I've seen no documentation > anywhere that says I should. > > Cheers, > > Hugh > > On 28 Apr 2009, at 15:28, jody wrote: > >> Hi Hugh >> >> Again, just to make sure, are the hostnames in your host file well-known? >> I.e. when you say you can do >> ssh nodename uptime >> do you use exactly the same nodename in your host file? >> (I'm trying to eliminate all non-Open-MPI error sources, >> because with your setup it should basically work.) >> >> One more point to consider is to update to Open-MPI 1.3. >> I don't think your OPen-MPI version is the cause of your trouble, >> but there have been quite some changes since v1.2.5 >> >> Jody >> >> On Tue, Apr 28, 2009 at 3:22 PM, Hugh Dickinson >> <h.j.dickin...@durham.ac.uk> wrote: >>> >>> Hi Jody, >>> >>> Indeed, all the nodes are running the same version of Open MPI. Perhaps I >>> was incorrect to describe the cluster as heterogeneous. In fact, all the >>> nodes run the same operating system (Scientific Linux 5.2), it's only the >>> hardware that's different and even then they're all i386 or i686. I'm >>> also >>> attaching the output of ompi_info --all as I've seen it's suggested in >>> the >>> mailing list instructions. >>> >>> Cheers, >>> >>> Hugh >>> >>> Hi Hugh >>> >>> Just to make sure: >>> You have installed Open-MPI on all your nodes? >>> Same version everywhere? >>> >>> Jody >>> >>> On Tue, Apr 28, 2009 at 12:57 PM, Hugh Dickinson >>> <h.j.dickinson_at_[hidden]> wrote: >>>> >>>> Hi all, >>>> >>>> First of all let me make it perfectly clear that I'm a complete beginner >>>> as >>>> far as MPI is concerned, so this may well be a trivial problem! >>>> >>>> I've tried to set up Open MPI to use SSH to communicate between nodes on >>>> a >>>> heterogeneous cluster. I've set up passwordless SSH and it seems to be >>>> working fine. For example by hand I can do: >>>> >>>> ssh nodename uptime >>>> >>>> and it returns the appropriate information for each node. >>>> I then tried running a non-MPI program on all the nodes at the same >>>> time: >>>> >>>> mpirun -np 10 --hostfile hostfile uptime >>>> >>>> Where hostfile is a list of the 10 cluster node names with slots=1 after >>>> each one i.e >>>> >>>> nodename1 slots=1 >>>> nodename2 slots=2 >>>> etc... >>>> >>>> Nothing happens! The process just seems to hang. If I interrupt the >>>> process >>>> with Ctrl-C I get: >>>> >>>> " >>>> >>>> mpirun: killing job... >>>> >>>> [gamma2.phyastcl.dur.ac.uk:18124] [0,0,0] ORTE_ERROR_LOG: Timeout in >>>> file >>>> base/pls_base_orted_cmds.c at line 275 >>>> [gamma2.phyastcl.dur.ac.uk:18124] [0,0,0] ORTE_ERROR_LOG: Timeout in >>>> file >>>> pls_rsh_module.c at line 1166 >>>> >>>> -------------------------------------------------------------------------- >>>> WARNING: mpirun has exited before it received notification that all >>>> started processes had terminated. You should double check and ensure >>>> that there are no runaway processes still executing. >>>> >>>> -------------------------------------------------------------------------- >>>> >>>> " >>>> >>>> If, instead of using the hostfile, I specify on the command line the >>>> host >>>> from which I'm running mpirun, e.g.: >>>> >>>> mpirun -np 1 --host nodename uptime >>>> >>>> then it works (i.e. if it doesn't need to communicate with other nodes). >>>> Do >>>> I need to tell Open MPI it should be using SSH to communicate? If so, >>>> how >>>> do >>>> I do this? To be honest I think it's trying to do so, because before I >>>> set >>>> up passwordless SSH it challenged me for lots of passwords. >>>> >>>> I'm running Open MPI 1.2.5 installed with Scientific Linux 5.2. Let me >>>> reiterate, it's very likely that I've done something stupid, so all >>>> suggestions are welcome. >>>> >>>> Cheers, >>>> >>>> Hugh >>>> >>>> _______________________________________________ >>>> users mailing list >>>> users_at_[hidden] >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >