Hi, On Tue, Sep 27, 2016 at 03:09:52PM +0000, Neil Jerram wrote: > Attached are 3 patches that my team has been using for routing through > IP-in-IP tunnels, rebased on 1.6.1. I'd like to explain why we find them > useful, and start a conversation about whether they or something like them > could be upstreamed (or perhaps if there's some better way of achieving our > aims). > > Calico [1] uses BIRD for BGP routing between the hosts in various cloud > orchestration systems (Kubernetes, OpenStack etc.), to distribute routes to > the pods/VMs/containers in those systems, each of which has its own IP. If > all the hosts are directly connected to each other, this is > straightforward, but sometimes they are not. For example GCE instances are > not directly connected to each other: there is at least one router between > them, that knows about routing GCE addresses, and to/from the Internet, and > we cannot peer with it or otherwise tell it how to route pod/VM/container > IPs. So if we use GCE to create e.g. OpenStack compute hosts, with Calico > networking, we need to do something extra to allow VM-addressed data to > pass between the compute hosts. > > One of our solutions is to use IP-in-IP; it works as shown by this diagram: > > 10.65.0.3 via 10.240.0.5 dev tunl0 onlink > default via 10.240.0.1 > | > +-|----------+ +------------+ > | o | | | > | Host A | +--------+ | Host B | > | |---------| Router |----------| | > | 10.240.0.4 | +--------+ | 10.240.0.5 | > | |---. | | > +------------+ | +------------+ > ^ ^ +---v---+ | > src 10.65.0.2 | | | tunl0 | | > dst 10.65.0.3 | | +-------+ | > | \ | v > +-----------+ '----' +-----------+ > | Pod A | src 10.240.0.4 | Pod B | > | 10.65.0.2 | dst 10.240.0.5 | 10.65.0.3 | > +-----------+ ------ +-----------+ > src 10.65.0.2 > dst 10.65.0.3
Can't you just use a tunnel between Host A and Host B and run BGP on top of this tunnel? It would seem to be cleaner than hacking multi-hop BGP to obtain appriopriate next-hop values, unless I am missing something. It would look something like this: +-|----------+ +------------+ | o Host A | | Host B | | | +--------+ | | | 10.240.0.4|---------| Router |----------|10.240.0.5 | | | +--------+ | | | 10.65.0.4|--. +-------+ +-------+ .->10.65.0.5 | +------------+ `>| tunlA |-->| tunlB |- +------------+ +-------+ +-------+ The BGP session would be established between 10.65.0.4 (IP of host A on tunlA) and 10.65.0.5 (IP of host B on tunlB), so that the routes learnt via BGP would be immediately correct. Basically, it's a simple overlay network. > The diagram shows Pod A sending a packet to Pod B, using IP addresses that > are unknown to the 'Router' between the two hosts. Host A has an IP-in-IP > device, tunl0, and a route that says to use that device for data to Pod B's > address (10.65.0.3). When the packet has passed through that device, it > has a new outer IP header, with src 10.240.0.4 and dst 10.240.0.5, and is > routed again according to the routing table - so now it can successfully > reach Host B. > > So how is BIRD involved? We statically program the local Pod route on each > host: > > On Host A: 10.65.0.2 dev <interface to Pod A> > On Host B: 10.65.0.3 dev <interface to Pod B> > > then run a BIRD BGP session between Host A and Host B to propagate those > routes to the other host - which would normally give us: > > On Host A: 10.65.0.3 via 10.240.0.5 > On Host B: 10.65.0.2 via 10.240.0.4 > > But we don't want those normal routes, because then the data would get lost > at 'Router'. So we enhance and configure BIRD as follows. > > - In the export filter for protocol kernel, for the relevant routes, we set > an attribute 'krt_tunnel = tunl0'. > > - We modify BIRD, as in the attached patches, to understand that that means > that those routes should have 'dev tunl0'. > > Then instead, we get: > > On Host A: 10.65.0.3 via 10.240.0.5 dev tunl0 onlink > On Host B: 10.65.0.2 via 10.240.0.4 dev tunl0 onlink > > which allows successful routing of data between the Pods. > > > Thanks for reading this far! I now have three questions: > > 1. Does the routing approach above make sense? (Or is there some better or > simpler or already supported way that we could achieve the same thing?) > > 2. If (1), would the BIRD team accept patches broadly on the lines of those > that are attached? > > 3. If (2), please let me know if the attached patches are already > acceptable, or otherwise what further work is needed for them. > > Many thanks, > Neil
signature.asc
Description: PGP signature