IIRC, 1.6.5 defaults to *not* using the tree spawn. We changed it in 1.7 series because the launch performance is so much better.
On Nov 11, 2013, at 8:22 AM, Christoffer Hamberg <christoffer.hamb...@gmail.com> wrote: > I re-configured the ssh keys now and for some reason it seems to work. But > what baffles me is that the same ssh configuration worked for the other > installation (1.6.5) but not for this one. > > Thanks for the help! > > > 2013/11/11 Reuti <re...@staff.uni-marburg.de> > Am 11.11.2013 um 10:04 schrieb Christoffer Hamberg: > > > (Correction; I mixed up the output of the two first examples in my first > > mail, so it fails on the first one) > > > > ubuntu@node0:~$ mpirun --leave-session-attached -mca plm_base_verbose 5 -np > > 4 -host node0,node1,node2,node3 hostname > > [node0:01486] mca:base:select:( plm) Querying component [slurm] > > [node0:01486] mca:base:select:( plm) Skipping component [slurm]. Query > > failed to return a module > > [node0:01486] mca:base:select:( plm) Querying component [rsh] > > [node0:01486] mca:base:select:( plm) Query of component [rsh] set priority > > to 10 > > [node0:01486] mca:base:select:( plm) Selected component [rsh] > > [node2:26962] mca:base:select:( plm) Querying component [rsh] > > [node2:26962] mca:base:select:( plm) Query of component [rsh] set priority > > to 10 > > [node2:26962] mca:base:select:( plm) Selected component [rsh] > > [node1:11477] mca:base:select:( plm) Querying component [rsh] > > [node1:11477] mca:base:select:( plm) Query of component [rsh] set priority > > to 10 > > [node1:11477] mca:base:select:( plm) Selected component [rsh] > > Host key verification failed. > > > > > > ubuntu@node0:~$ mpirun -mca plm_rsh_no_tree_spawn 1 -np 4 -host > > node0,node1,node2,node3 hostname > > node0 > > node1 > > node2 > > node3 > > > > So it definetely looks like a problem with the tree spawn. Any clue how I > > could proceed? > > The passphraseless ssh is also possible between the nodes? Using hostbased > authentication it's also possible to enable it for all users without the > necessity to prepare the ssh keys. > > -- Reuti > > > > /Christoffer > > > > > > 2013/11/11 Ralph Castain <r...@open-mpi.org> > > Add --enable-debug to your configure and run it with the following > > additional options > > > > --leave-session-attached -mca plm_base_verbose 5 > > > > Let's see where it fails during the launch phase. Offhand, the only thing > > that message means to me is that the ssh keys are botched on at least one > > node. Keep in mind that we use a tree-based launch, and so when you have > > more than two nodes, one or more of the intermediate nodes are executing an > > ssh. > > > > One way to see if that's the problem is to launch without the tree spawn: > > add > > > > -mca plm_rsh_no_tree_spawn 1 > > > > to your cmd line and see if it works. > > > > > > > > On Nov 10, 2013, at 9:24 AM, Christoffer Hamberg > > <christoffer.hamb...@gmail.com> wrote: > > > >> Hi, > >> > >> I'm having some strange problems running Open MPI(1.9a1r29559) with Java > >> bindings on a Calxeda highbank ARM Server running Ubuntu 12.10 (GNU/Linux > >> 3.5.0-43-highbank armv7l). > >> > >> The problem arises when I try to run a job on more than 3 nodes (I have a > >> total of 8). > >> Note: It's the same error for any of the node[0-7]. > >> > >> ubuntu@node0:~$ mpirun -np 4 -host node0,node1,node2 hostname > >> Host key verification failed. > >> > >> ubuntu@node0:~$ mpirun -np 4 -host node0,node1,node2,node3 hostname > >> node0 > >> node0 > >> node1 > >> node2 > >> > >> and not running the job on the current node also gives Host key > >> verification failed for only 3 nodes. > >> > >> ubuntu@node0:~$ mpirun -np 4 -host node1,node3,node5 hostname > >> Host key verification failed. > >> > >> But not on 2 nodes: > >> ubuntu@node0:~$ mpirun -np 4 -host node1,node3 hostname > >> node1 > >> node1 > >> node3 > >> node3 > >> > >> I've configured it with the following: > >> ./configure --prefix=/opt/openmpi-1.9-java --without-openib > >> --enable-static --with-threads=posix --enable-mpi-thread-multiple > >> --enable-mpi-java --with-jdk-bindir=/usr/lib/jvm/java-7-openjdk-armhf/bin > >> --with-jdk-headers=/usr/lib/jvm/java-7-openjdk-armhf/include > >> > >> I have Open MPI 1.6.5 (without Java-binding) installed and it runs without > >> any problems on all nodes, so there should be no problem with SSH that the > >> error points to. > >> > >> Any ideas? > >> > >> Regards, > >> Christoffer > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users