Ooops I meant = false. Thanks for the tip, it turns out the fault lay in a specific node that required oob_tcp_if_include to be set.
On Friday, 11 April 2014, Ralph Castain <r...@open-mpi.org> wrote: > I'm a little confused - the "no_tree_spawn=true" option means that we are > *not* using tree spawn, and so mpirun is directly launching each daemon > onto its node. Thus, this requires that the host mpirun is on be able to > ssh to every other host in the allocation. > > You can debug the rsh launcher by setting "-mca plm_base_verbose 5 > --debug-daemons" on the cmd line. > > > On Apr 10, 2014, at 9:50 PM, Anthony Alba > <ascanio.al...@gmail.com<javascript:;>> > wrote: > > > > > Is there a way to troubleshoot > > plm_rsh_no_tree_spawn=true hang? > > > > I have a set of passwordless-ssh nodes, each node can ssh into any > other., i.e., > > > > for h1 in A B C D; do for h2 in A B C D; do ssh $h1 ssh $h2 hostname; > done; done > > > > works perfectly. > > > > Generally tree spawn works, however there is one host where > > launching mpirun with tree spawn hangs as soon as there are 6 or more > host (with launch node also in the host list). If the launcher is not in > the host list the hang happens with five hosts. > > > > > > - Anthony > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org <javascript:;> > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org <javascript:;> > http://www.open-mpi.org/mailman/listinfo.cgi/users >