A... thanks Gilles. That makes sense. I was stuck thinking there was
an ssh problem on rank 0; it never occurred to me mpirun was doing
something clever there and that those ssh errors were from a different
instance altogether.
It's no problem to put my private key on all instances - I'll go
Adam,
by default, when more than 64 hosts are involved, mpirun uses a tree
spawn in order to remote launch the orted daemons.
That means you have two options here :
- allow all compute nodes to ssh each other (e.g. the ssh private key
of *all* the nodes should be in *all* the authorized_keys
-