[OMPI users] mpirun with ssh tunneling

2016-12-25 Thread Adam Sylvester
I'm trying to use OpenMPI 1.10.4 to communicate between two Docker
containers running on two different physical machines.  Docker doesn't have
much to do with my question (unless someone has a suggestion for a better
way to do what I'm trying to :o) )... each Docker container is running an
OpenSSH server which shows up as 172.17.0.1 on the physical hosts:

$ ifconfig docker0
docker0   Link encap:Ethernet  HWaddr 02:42:8E:07:05:A0
  inet addr:172.17.0.1  Bcast:0.0.0.0  Mask:255.255.0.0
  inet6 addr: fe80::42:8eff:fe07:5a0/64 Scope:Link

The Docker container's ssh port is published on the physical host as port
32768.

The Docker container has a user 'mpirun' which I have public/private ssh
keys set up for.

Let's call the physical hosts host1 and host2; each host is running a
Docker container I'll refer to as docker1 and docker2 respectively.  So,
this means I can...
1. ssh From host1 into docker1:
ssh mpirun@172.17.0.1 -i ssh/id_rsa -p 32768

2. Set up an ssh tunnel from inside docker1, through host2, into docker2,
on local port 4334 (ec2-user is the login to host2)
ssh -f -N -q -o "TCPKeepAlive yes" -o "ServerAliveInterval 60" -L 4334:
172.17.0.1:32768 -l ec2-user host2

3. Update my ~/.ssh/config file to name this host 'docker2':
StrictHostKeyChecking no
Host docker2
  HostName 127.0.0.1
  Port 4334
  User mpirun

4. I can now do 'ssh docker2' and ssh into it without issues.

Here's where I get stuck.  I'd read that OpenMPI's mpirun didn't support
ssh'ing on a non-standard port, so I thought I could just do step 3 above
and then list the hosts when I run mpirun from docker1:

mpirun --prefix /usr/local -n 2 -H localhost,docker2
/home/mpirun/mpi_hello_world

However, I get:
[3524ae84a26b:00197] [[55635,0],1] tcp_peer_send_blocking: send() to socket
9 failed: Broken pipe (32)
--
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
  settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
  Please verify your allocation and authorities.

* the inability to write startup files into /tmp
(--tmpdir/orte_tmpdir_base).
  Please check with your sys admin to determine the correct location to use.

*  compilation of the orted with dynamic libraries when static are required
  (e.g., on Cray). Please check your configure cmd line and consider using
  one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
  lack of common network interfaces and/or no route found between
  them. Please check network connectivity (including firewalls
  and network routing requirements).
--

I'm guessing that something's going wrong when docker2 tries to communicate
back to docker1.  However, I'm not sure what additional tunneling to set up
to support this.  My understanding of ssh tunnels is relatively basic... I
can of course create a tunnel on docker2 back to docker1 but I don't know
how ssh/mpi will "find" it.  I've read a bit about reverse ssh tunneling
but it's not clear enough to me what this is doing to apply it here.

Any help is much appreciated!
-Adam
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] mpirun with ssh tunneling

2016-12-25 Thread Gilles Gouaillardet

Adam,


there are several things here


with an up-to-date master, you can specify an alternate ssh port via a 
hostfile


see https://github.com/open-mpi/ompi/issues/2224


Open MPI requires more than just ssh.

- remote nodes (orted) need to call back mpirun (oob/tcp)

- nodes (MPI tasks) need to be able to connect to each other (btl/tcp)


regarding oob/tcp, your mpirun command line will basically do under the hood
ssh docker2 orted  

then each task will use a port for btl/tcp, and tasks might directly 
connect to each other with the docker IP and this port.


by default, these two ports are dynamic, but you can use static port 
(range) via MCA parameter
mpirun --mca oob_tcp_static_ipv4_ports xxx --mca oob_btl_tcp_port_min_v4 
yyy --mca btl_tcp_port_range_v4 zzz



that does not change the fact that ssh tunneling works with host 
addresses, and Open MPI will (internally) use docker addresses.



i'd rather suggest you try to
- enable IP connectivity between your containers (eventually running on 
different hosts)
- assuming you need (some) network isolation, then use static ports, and 
update your firewall to allow full TCP/IP connectivity on these ports

  and port 22 (ssh).

you can also refer to https://github.com/open-mpi/ompi/issues/1511
yet an other way to use docker was discussed here.

last but not least, if you want to use containers but you are not tied 
to docker, you can consider http://singularity.lbl.gov/
(as far as Open MPI is concerned,native support is expected for Open MPI 
2.1)



Cheers,

Gilles

On 12/26/2016 6:11 AM, Adam Sylvester wrote:
I'm trying to use OpenMPI 1.10.4 to communicate between two Docker 
containers running on two different physical machines.  Docker doesn't 
have much to do with my question (unless someone has a suggestion for 
a better way to do what I'm trying to :o) )... each Docker container 
is running an OpenSSH server which shows up as 172.17.0.1 on the 
physical hosts:


$ ifconfig docker0
docker0   Link encap:Ethernet  HWaddr 02:42:8E:07:05:A0
  inet addr:172.17.0.1  Bcast:0.0.0.0  Mask:255.255.0.0
  inet6 addr: fe80::42:8eff:fe07:5a0/64 Scope:Link

The Docker container's ssh port is published on the physical host as 
port 32768.


The Docker container has a user 'mpirun' which I have public/private 
ssh keys set up for.


Let's call the physical hosts host1 and host2; each host is running a 
Docker container I'll refer to as docker1 and docker2 respectively.  
So, this means I can...

1. ssh From host1 into docker1:
ssh mpirun@172.17.0.1  -i ssh/id_rsa -p 32768

2. Set up an ssh tunnel from inside docker1, through host2, into 
docker2, on local port 4334 (ec2-user is the login to host2)
ssh -f -N -q -o "TCPKeepAlive yes" -o "ServerAliveInterval 60" -L 
4334:172.17.0.1:32768  -l ec2-user host2


3. Update my ~/.ssh/config file to name this host 'docker2':
StrictHostKeyChecking no
Host docker2
  HostName 127.0.0.1
  Port 4334
  User mpirun

4. I can now do 'ssh docker2' and ssh into it without issues.

Here's where I get stuck.  I'd read that OpenMPI's mpirun didn't 
support ssh'ing on a non-standard port, so I thought I could just do 
step 3 above and then list the hosts when I run mpirun from docker1:


mpirun --prefix /usr/local -n 2 -H localhost,docker2 
/home/mpirun/mpi_hello_world


However, I get:
[3524ae84a26b:00197] [[55635,0],1] tcp_peer_send_blocking: send() to 
socket 9 failed: Broken pipe (32)

--
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
  settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
  Please verify your allocation and authorities.

* the inability to write startup files into /tmp 
(--tmpdir/orte_tmpdir_base).
  Please check with your sys admin to determine the correct location 
to use.


*  compilation of the orted with dynamic libraries when static are 
required

  (e.g., on Cray). Please check your configure cmd line and consider using
  one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
  lack of common network interfaces and/or no route found between
  them. Please check network connectivity (including firewalls
  and network routing requirements).
--

I'm guessing that something's going wrong when docker2 tries to 
communicate back to docker1.  However, I'm not sure what additional 
tunneling to set up to support this.  My understanding of ssh tunnels 
is relatively basic... I can of course create a tunnel on docker2 back 
to docker1 but I don't know how ssh/mpi will "find" it.  I've read a 
bit about reverse ss