Re: [OMPI users] Cannot run a job with more than 3 nodes

2014-03-12 Thread Jeff Squyres (jsquyres)
Can you verify that for all 4 nodes? I.e., something like this: foreach node (Node1 Node2 Node3 Node4) foreach other (Node1 Node2 Node3 Node 4) echo from $node to $other ssh $node ssh $other hostname On Mar 12, 2014, at 7:34 AM, Victor wrote: > Yes they are. Can resolve and log i

Re: [OMPI users] Cannot run a job with more than 3 nodes

2014-03-12 Thread Victor
Yes they are. Can resolve and log into each node, from each node, using their "friendly" name, not IP. On 12 March 2014 18:15, Jeff Squyres (jsquyres) wrote: > Are all names resolvable from all servers? > > I.e., if you "ssh Node4" from Node1, Node2, and Node3, does it work? > > > On Mar 12, 20

Re: [OMPI users] Cannot run a job with more than 3 nodes

2014-03-12 Thread Jeff Squyres (jsquyres)
Are all names resolvable from all servers? I.e., if you "ssh Node4" from Node1, Node2, and Node3, does it work? On Mar 12, 2014, at 4:07 AM, Victor wrote: > Hostname no I use lower case, but for some reason while I was writing the > email I thought that upper case is clearer... > > The s

Re: [OMPI users] Cannot run a job with more than 3 nodes

2014-03-12 Thread Victor
Hostname no I use lower case, but for some reason while I was writing the email I thought that upper case is clearer... The same version of Ubuntu (12.04 x64) is on all nodes and openmpi and the executable are shared via nfs. On 12 March 2014 16:01, Reuti wrote: > Hi, > > Am 12.03.2014 um

Re: [OMPI users] Cannot run a job with more than 3 nodes

2014-03-12 Thread Reuti
Hi, Am 12.03.2014 um 07:37 schrieb Victor: > I am using openmpi 1.7.4 on Ubuntu 12.04 x64 and I have a very odd problem. > > I have 4 nodes, all of which are defined in the hostfile and in /etc/hosts. > > I can log into each node using ssh and certificate method from the shell that > is runnin

Re: [OMPI users] Cannot run a job with more than 3 nodes

2014-03-12 Thread Victor
I "fixed it" by finding the message regarding tree spawn in a thread from November 2013. When I run the job with -mca plm_rsh_no_tree_spawn 1 the job works over 4 nodes. I cannot identify any errors in ssh key setup and since I am only using 4 nodes I am not concerned about somewhat slower launch