Hi all I encountered a problem about mpirun and SSH when using OMPI 1.10.0 compiled with gcc, running on centos7.2. When I execute mpirun on my 2 node cluster, I get the following errors pasted below.
[douraku@master home]$ mpirun -np 12 a.out Permission denied (publickey,gssapi-keyex,gssapi-with-mic). -------------------------------------------------------------------------- ORTE was unable to reliably start one or more daemons. This usually is caused by: * not finding the required libraries and/or binaries on one or more nodes. Please check your PATH and LD_LIBRARY_PATH settings, or configure OMPI with --enable-orterun-prefix-by-default * lack of authority to execute on one or more specified nodes. Please verify your allocation and authorities. * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base). Please check with your sys admin to determine the correct location to use. * compilation of the orted with dynamic libraries when static are required (e.g., on Cray). Please check your configure cmd line and consider using one of the contrib/platform definitions for your system type. * an inability to create a connection back to mpirun due to a lack of common network interfaces and/or no route found between them. Please check network connectivity (including firewalls and network routing requirements). -------------------------------------------------------------------------- Here are some information about settings. - When only master node is used, this does not happen. - opmi is installed in /opt/openmpi-1.10.0/ on the master node. - /opt and /home are exported and are nfs mounted on slave node. - master and slave and their # of cpu's are written in the openmpi-default-hostfile - path to mpi library was confirmed (No doubt, because /home and /opt are shared). - password-less login using public key has been configured. So, I can login from master to slave, or slave to master without password. I see similar issues in FAQ on the system consisting of multiple slave nodes, where ssh login is necessary between the slave nodes due to the "tree structure" of ompi. So, I am puzzled why the same issue occur because my system does not have multiple slave nodes (and password less loging was established for both direction). I hope I could have some suggestions for solving this issue. Many thanks in advance.