[OMPI users] problem about mpirun on two nodes

2016-05-21 Thread douraku
Hi all

I encountered a problem about mpirun and SSH when using OMPI 1.10.0 compiled 
with gcc, running on centos7.2.
When I execute mpirun on my 2 node cluster, I get the following errors pasted 
below.


[douraku@master home]$ mpirun -np 12 a.out
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
--
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
  settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
  Please verify your allocation and authorities.

* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
  Please check with your sys admin to determine the correct location to use.

*  compilation of the orted with dynamic libraries when static are required
  (e.g., on Cray). Please check your configure cmd line and consider using
  one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
  lack of common network interfaces and/or no route found between
  them. Please check network connectivity (including firewalls
  and network routing requirements).
--


Here are some information about settings.
- When only master node is used, this does not happen.
- opmi is installed in /opt/openmpi-1.10.0/ on the master node.
- /opt and /home are exported and are nfs mounted on slave node.
- master and slave and their # of cpu's are written in the 
openmpi-default-hostfile
- path to mpi library was confirmed (No doubt, because /home and /opt are 
shared).
- password-less login using public key has been configured. So, I can login 
from master to slave, or slave to master without password. 

I see similar issues in FAQ on the system consisting of multiple slave nodes, 
where ssh login is necessary between the slave nodes due to the "tree 
structure" of ompi. So, I am puzzled why the same issue occur because my system 
does not have multiple slave nodes (and password less loging was established 
for both direction).
I hope I could have some suggestions for solving this issue.

Many thanks in advance.










Re: [OMPI users] problem about mpirun on two nodes

2016-05-23 Thread douraku
Jeff, Thank you for your advice.

By bad. I took the wrong shot, because I tested so many different settings. 
After I came back to the original network settings, "permission denied', of 
course disappeared, but the other messages were still there. The master node 
has two NICs, one for WAN (via another server) with zone=external and the other 
for the slave node, zone = internal. The NICs on the master are in different 
subnet.
NIC on the slave node is set to 'internal'.Their status was confirmed by 
firewall-cmd --get-active-zones. 

I temporary stopped firewalld and the error messages disappeared. I saw six 
processes were running on each node, but now the all processes keep running 
forever with 100% CPU usage.


-Original Message-
From: Jeff Squyres (jsquyres) 
To: Open MPI User's List 
Sent: Mon, May 23, 2016 9:13 am
Subject: Re: [OMPI users] problem about mpirun on two nodes

On May 21, 2016, at 11:31 PM, dour...@aol.com wrote:
> 
> I encountered a problem about mpirun and SSH when using OMPI 1.10.0 compiled 
> with gcc, running on centos7.2.
> When I execute mpirun on my 2 node cluster, I get the following errors pasted 
> below.
> 
> [douraku@master home]$ mpirun -np 12 a.out
> Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

This is the key right here: you got a permission denied error when you 
(assumedly) tried to execute on the remote server.

Triple check your ssh settings to ensure that you can run on the remote 
server(s) without a password or interactive passphrase entry.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

___
users mailing list
us...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/05/29282.php