IIRC, 1.6.5 defaults to *not* using the tree spawn. We changed it in 1.7 series 
because the launch performance is so much better.


On Nov 11, 2013, at 8:22 AM, Christoffer Hamberg 
<christoffer.hamb...@gmail.com> wrote:

> I re-configured the ssh keys now and for some reason it seems to work. But 
> what baffles me is that the same ssh configuration worked for the other 
> installation (1.6.5) but not for this one.
> 
> Thanks for the help!
> 
> 
> 2013/11/11 Reuti <re...@staff.uni-marburg.de>
> Am 11.11.2013 um 10:04 schrieb Christoffer Hamberg:
> 
> > (Correction; I mixed up the output of the two first examples in my first 
> > mail, so it fails on the first one)
> >
> > ubuntu@node0:~$ mpirun --leave-session-attached -mca plm_base_verbose 5 -np 
> > 4 -host node0,node1,node2,node3 hostname
> > [node0:01486] mca:base:select:(  plm) Querying component [slurm]
> > [node0:01486] mca:base:select:(  plm) Skipping component [slurm]. Query 
> > failed to return a module
> > [node0:01486] mca:base:select:(  plm) Querying component [rsh]
> > [node0:01486] mca:base:select:(  plm) Query of component [rsh] set priority 
> > to 10
> > [node0:01486] mca:base:select:(  plm) Selected component [rsh]
> > [node2:26962] mca:base:select:(  plm) Querying component [rsh]
> > [node2:26962] mca:base:select:(  plm) Query of component [rsh] set priority 
> > to 10
> > [node2:26962] mca:base:select:(  plm) Selected component [rsh]
> > [node1:11477] mca:base:select:(  plm) Querying component [rsh]
> > [node1:11477] mca:base:select:(  plm) Query of component [rsh] set priority 
> > to 10
> > [node1:11477] mca:base:select:(  plm) Selected component [rsh]
> > Host key verification failed.
> >
> >
> > ubuntu@node0:~$ mpirun -mca plm_rsh_no_tree_spawn 1 -np 4 -host 
> > node0,node1,node2,node3 hostname
> > node0
> > node1
> > node2
> > node3
> >
> > So it definetely looks like a problem with the tree spawn. Any clue how I 
> > could proceed?
> 
> The passphraseless ssh is also possible between the nodes? Using hostbased 
> authentication it's also possible to enable it for all users without the 
> necessity to prepare the ssh keys.
> 
> -- Reuti
> 
> 
> > /Christoffer
> >
> >
> > 2013/11/11 Ralph Castain <r...@open-mpi.org>
> > Add --enable-debug to your configure and run it with the following 
> > additional options
> >
> > --leave-session-attached -mca plm_base_verbose 5
> >
> > Let's see where it fails during the launch phase. Offhand, the only thing 
> > that message means to me is that the ssh keys are botched on at least one 
> > node. Keep in mind that we use a tree-based launch, and so when you have 
> > more than two nodes, one or more of the intermediate nodes are executing an 
> > ssh.
> >
> > One way to see if that's the problem is to launch without the tree spawn: 
> > add
> >
> > -mca plm_rsh_no_tree_spawn 1
> >
> > to your cmd line and see if it works.
> >
> >
> >
> > On Nov 10, 2013, at 9:24 AM, Christoffer Hamberg 
> > <christoffer.hamb...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> I'm having some strange problems running Open MPI(1.9a1r29559) with Java 
> >> bindings on a Calxeda highbank ARM Server running Ubuntu 12.10 (GNU/Linux 
> >> 3.5.0-43-highbank armv7l).
> >>
> >> The problem arises when I try to run a job on more than 3 nodes (I have a 
> >> total of 8).
> >> Note: It's the same error for any of the node[0-7].
> >>
> >> ubuntu@node0:~$ mpirun -np 4 -host node0,node1,node2 hostname
> >> Host key verification failed.
> >>
> >> ubuntu@node0:~$ mpirun -np 4 -host node0,node1,node2,node3 hostname
> >> node0
> >> node0
> >> node1
> >> node2
> >>
> >> and not running the job on the current node also gives Host key 
> >> verification failed for only 3 nodes.
> >>
> >> ubuntu@node0:~$ mpirun -np 4 -host node1,node3,node5 hostname
> >> Host key verification failed.
> >>
> >> But not on 2 nodes:
> >> ubuntu@node0:~$ mpirun -np 4 -host node1,node3 hostname
> >> node1
> >> node1
> >> node3
> >> node3
> >>
> >> I've configured it with the following:
> >> ./configure --prefix=/opt/openmpi-1.9-java --without-openib 
> >> --enable-static --with-threads=posix --enable-mpi-thread-multiple 
> >> --enable-mpi-java --with-jdk-bindir=/usr/lib/jvm/java-7-openjdk-armhf/bin 
> >> --with-jdk-headers=/usr/lib/jvm/java-7-openjdk-armhf/include
> >>
> >> I have Open MPI 1.6.5 (without Java-binding) installed and it runs without 
> >> any problems on all nodes, so there should be no problem with SSH that the 
> >> error points to.
> >>
> >> Any ideas?
> >>
> >> Regards,
> >> Christoffer
> >> _______________________________________________
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to