I'm a little confused - the "no_tree_spawn=true" option means that we are *not* using tree spawn, and so mpirun is directly launching each daemon onto its node. Thus, this requires that the host mpirun is on be able to ssh to every other host in the allocation.
You can debug the rsh launcher by setting "-mca plm_base_verbose 5 --debug-daemons" on the cmd line. On Apr 10, 2014, at 9:50 PM, Anthony Alba <ascanio.al...@gmail.com> wrote: > > Is there a way to troubleshoot > plm_rsh_no_tree_spawn=true hang? > > I have a set of passwordless-ssh nodes, each node can ssh into any other., > i.e., > > for h1 in A B C D; do for h2 in A B C D; do ssh $h1 ssh $h2 hostname; done; > done > > works perfectly. > > Generally tree spawn works, however there is one host where > launching mpirun with tree spawn hangs as soon as there are 6 or more host > (with launch node also in the host list). If the launcher is not in the host > list the hang happens with five hosts. > > > - Anthony > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users