I'm running OpenMPI 2.1.0, built from source, on RHEL 7. I'm using the default ssh-based launcher, where I have my private ssh key on rank 0 and the associated public key on all ranks. I create a hosts file with a list of unique IPs, with the host that I'm running mpirun from on the first line, and run this command:
mpirun -N 1 --bind-to none --hostfile hosts.txt hostname This works fine up to 64 machines. At 65 or greater, I get ssh errors. Frequently Permission denied (publickey,gssapi-keyex,gssapi-with-mic) though today another user got Host key verification failed. I have confirmed I can successfully manually ssh into these instances. I've also written a loop in bash which will background an ssh sleep command to > 64 instances and this succeeds. >From what I can tell, the /etc/ssh/ssh*config settings that limit ssh connections have to do with inbound, not outbound limits, and I can prove by running straight ssh commands that I'm not hitting a limit. Is there something wrong with my mpirun syntax (I've run this way thousands of times without issues with fewer than 64 hosts, and I know MPI is frequently used on orders of magnitudes more hosts than this)? Or is this a known bug that's addressed in a later MPI release? Thanks for the help. -Adam
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users