Re: [OMPI users] Troubleshooting mpirun with tree spawn hang

2014-04-11 Thread Anthony Alba
Ooops I meant = false. Thanks for the tip, it turns out the fault lay in a specific node that required oob_tcp_if_include to be set. On Friday, 11 April 2014, Ralph Castain wrote: > I'm a little confused - the "no_tree_spawn=true" option means that we are > *not* using tree spawn, and so mpirun

Re: [OMPI users] Troubleshooting mpirun with tree spawn hang

2014-04-11 Thread Ralph Castain
I'm a little confused - the "no_tree_spawn=true" option means that we are *not* using tree spawn, and so mpirun is directly launching each daemon onto its node. Thus, this requires that the host mpirun is on be able to ssh to every other host in the allocation. You can debug the rsh launcher by

[OMPI users] Troubleshooting mpirun with tree spawn hang

2014-04-11 Thread Anthony Alba
Is there a way to troubleshoot plm_rsh_no_tree_spawn=true hang? I have a set of passwordless-ssh nodes, each node can ssh into any other., i.e., for h1 in A B C D; do for h2 in A B C D; do ssh $h1 ssh $h2 hostname; done; done works perfectly. Generally tree spawn works, however there is one hos