Yes they are. Can resolve and log into each node, from each node, using their "friendly" name, not IP.
On 12 March 2014 18:15, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote: > Are all names resolvable from all servers? > > I.e., if you "ssh Node4" from Node1, Node2, and Node3, does it work? > > > On Mar 12, 2014, at 4:07 AM, Victor <victor.ma...@gmail.com> wrote: > > > Hostname.... no I use lower case, but for some reason while I was > writing the email I thought that upper case is clearer... > > > > The same version of Ubuntu (12.04 x64) is on all nodes and openmpi and > the executable are shared via nfs. > > > > > > On 12 March 2014 16:01, Reuti <re...@staff.uni-marburg.de> wrote: > > Hi, > > > > Am 12.03.2014 um 07:37 schrieb Victor: > > > > > I am using openmpi 1.7.4 on Ubuntu 12.04 x64 and I have a very odd > problem. > > > > > > I have 4 nodes, all of which are defined in the hostfile and in > /etc/hosts. > > > > > > I can log into each node using ssh and certificate method from the > shell that is running the mpi job, by sing their name as defined in > /etc/hosts. > > > > > > I can run an mpi job if I include only 3 nodes in the hostfile, for > example: > > > > > > Node1 slots=8 max-slots=8 > > > Node2 slots=8 max-slots=8 > > > Node3 slots=8 max-slots=8 > > > > You are using an uppercase name here by intention - this is the one the > host returns by `hostname`? Although it is allowed and should be mangled to > lowercase resp. ignored for hostname resolution, I found that not all > programs are doing it. Best is to use only lowercase characters is my > experience. > > > > The same version of your Ubuntu Linux is installed on all machines? > > > > -- Reuti > > > > > > > But if I add a fourth node into the hostfile eg: > > > > > > Node1 slots=8 max-slots=8 > > > Node2 slots=8 max-slots=8 > > > Node3 slots=8 max-slots=8 > > > Node4 slots=8 max-slots=8 > > > > > > I get this error after attempting mpirun -np 32 --hostfile hostfile > a.out: > > > > > > ssh: Could not resolve hostname Node4: Name or service not known. > > > > > > But, I can log into Node4 using ssh from the same shell by using ssh > Node4. > > > > > > Also if I mix up the hostfile like this for example and place Node1 to > the last spot: > > > > > > Node4 slots=8 max-slots=8 > > > Node2 slots=8 max-slots=8 > > > Node3 slots=8 max-slots=8 > > > Node1 slots=8 max-slots=8 > > > > > > The error becomes > > > > > > ssh: Could not resolve hostname Node1: Name or service not known. > > > > > > If I then go back to the three node hostfile like this: > > > > > > Node1 slots=8 max-slots=8 > > > Node4 slots=8 max-slots=8 > > > Node2 slots=8 max-slots=8 > > > > > > There is no error with three nodes even though both Node1 and Node4 > "cannot be found" if they are present in a 4 node hostfile in the last > spot. The last slot seems to be bugged. > > > > > > What is going on? How do I fix this? > > > _______________________________________________ > > > users mailing list > > > us...@open-mpi.org > > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >