[OMPI users] problem about mpirun on two nodes
Hi all I encountered a problem about mpirun and SSH when using OMPI 1.10.0 compiled with gcc, running on centos7.2. When I execute mpirun on my 2 node cluster, I get the following errors pasted below. [douraku@master home]$ mpirun -np 12 a.out Permission denied (publickey,gssapi-keyex,gssapi-with-mic). -- ORTE was unable to reliably start one or more daemons. This usually is caused by: * not finding the required libraries and/or binaries on one or more nodes. Please check your PATH and LD_LIBRARY_PATH settings, or configure OMPI with --enable-orterun-prefix-by-default * lack of authority to execute on one or more specified nodes. Please verify your allocation and authorities. * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base). Please check with your sys admin to determine the correct location to use. * compilation of the orted with dynamic libraries when static are required (e.g., on Cray). Please check your configure cmd line and consider using one of the contrib/platform definitions for your system type. * an inability to create a connection back to mpirun due to a lack of common network interfaces and/or no route found between them. Please check network connectivity (including firewalls and network routing requirements). -- Here are some information about settings. - When only master node is used, this does not happen. - opmi is installed in /opt/openmpi-1.10.0/ on the master node. - /opt and /home are exported and are nfs mounted on slave node. - master and slave and their # of cpu's are written in the openmpi-default-hostfile - path to mpi library was confirmed (No doubt, because /home and /opt are shared). - password-less login using public key has been configured. So, I can login from master to slave, or slave to master without password. I see similar issues in FAQ on the system consisting of multiple slave nodes, where ssh login is necessary between the slave nodes due to the "tree structure" of ompi. So, I am puzzled why the same issue occur because my system does not have multiple slave nodes (and password less loging was established for both direction). I hope I could have some suggestions for solving this issue. Many thanks in advance.
Re: [OMPI users] problem about mpirun on two nodes
Jeff, Thank you for your advice. By bad. I took the wrong shot, because I tested so many different settings. After I came back to the original network settings, "permission denied', of course disappeared, but the other messages were still there. The master node has two NICs, one for WAN (via another server) with zone=external and the other for the slave node, zone = internal. The NICs on the master are in different subnet. NIC on the slave node is set to 'internal'.Their status was confirmed by firewall-cmd --get-active-zones. I temporary stopped firewalld and the error messages disappeared. I saw six processes were running on each node, but now the all processes keep running forever with 100% CPU usage. -Original Message- From: Jeff Squyres (jsquyres) To: Open MPI User's List Sent: Mon, May 23, 2016 9:13 am Subject: Re: [OMPI users] problem about mpirun on two nodes On May 21, 2016, at 11:31 PM, dour...@aol.com wrote: > > I encountered a problem about mpirun and SSH when using OMPI 1.10.0 compiled > with gcc, running on centos7.2. > When I execute mpirun on my 2 node cluster, I get the following errors pasted > below. > > [douraku@master home]$ mpirun -np 12 a.out > Permission denied (publickey,gssapi-keyex,gssapi-with-mic). This is the key right here: you got a permission denied error when you (assumedly) tried to execute on the remote server. Triple check your ssh settings to ensure that you can run on the remote server(s) without a password or interactive passphrase entry. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ___ users mailing list us...@open-mpi.org Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2016/05/29282.php