Hi,

we have done a POC at work, with vxlan + ebgp-vpn,
and we have tested a very nice feature : anycast gateway.

basicaly, each vmbrX of a specific tenant, have the same ipaddress/mac address.
This ip is the default gateway of the vm.

That mean that vm can be migrate across hosts :)

(openstack have a network model like this called "dvr", distributed virtual 
routing, vmware :nsx Distributed Logical Router (DLR))

I think this model works very fine with proxmox, as all proxmox nodes are 
master.


This open a lot of possiblities:

- distributed dhcp on all cluster nodes (btw, http://kea.isc.org seem to be a 
very good candidate, can be extended with custom backend)
- distributed dns
- S-nat for vm private only -> internet. (need 1 public ip on each hosts)
- 1:1 nat (aka floating ip in openstack,google cloud,...) : a public ip is 
created on host, and "migrate" at the same time than a vm
  https://assafmuller.com/2015/04/15/distributed-virtual-routing-floating-ips/
- cloudinit metadata-server should be easy to implement
- add vrf support to isolate host (avoid connect from vm to the host gateway 
ip), or prevent inter-routing between tenants
- maybe other cool stuffs I don't have think yet about :)



I really like the vxlan bgp evpn, because it's a standard, with no central 
controller,
and we can also use it on physical switch/routers, differents proxmox 
clusters,and also on docker/kubernetes cluster.

Also, this fully supported in current linux kernel. (cumulus network work hard 
on it, and use it on their switch).
We only need : the linux kernel + a bgp routing deamon (quagga, frr, bird, 
gobgp,...)


I can already configure manually each brige and vtep from each network in 
/etc/network/interfaces 
(with ifupdown2, https://github.com/CumulusNetworks/ifupdown2/tree/v1.0.0, 
already support new kernel 4.14 features).
I have checked systemd-networkd, seem to be good, with some missing optional 
features like arp suppression (kernel 4.14)

But this could be done easy with some custom code with ip commands.


basicaly we need to define something like

local on each node
-------------------
/etc/pve/node/

host1:
------
vtep :  myvtep
        dstport 4789 
        address 203.0.113.1 
        learningmode nolearning

(each node have a loopback with this ip address, which is used to generate vtep 
and the local bgp config too from this)

host2:
vtep :  myvtep
        dstport 4789 
        address 203.0.113.2 
        learningmode nolearning

global
------
/etc/pve/networks.cfg

vxlanebgp: tenantnetwork1
           gateway_address 10.0.1.1/24
           gateway_macaddress a2:ed:21:06:e7:48
           vni 1
           vtep myvtep


vxlanebgp: tenantnetwork2
           gateway_address 10.0.2.1/24
           gateway_macaddress a2:ed:21:06:e7:48
           vni 2
           vtep myvtep



Then when vm start, we can generate the bridge with anycast address, 
create a vtep and plug it on the bridge (differents vtep reuse the same 
loopback address)


manual config:

host1 config
------------
/etc/network/interfaces
-----------------------
auto lo
iface lo inet loopback
        pre-up ip addr add 203.0.113.1/32 dev lo

auto vmbr1
iface vmbr1 inet static
        address 10.0.1.1/24
        bridge_ports vxlan1
        bridge_stp off
        bridge_fd 0
        pre-up ip link add vxlan1 type vxlan id 1 dstport 4789 local 
203.0.113.1 nolearning
        pre-up ip link set dev vmbr1 address a2:ed:21:06:e7:48
        pre-up brctl addif vmbr1 vxlan1

auto vmbr2
iface vmbr2 inet static
        address 10.0.2.1/24
        bridge_ports vxlan1
        bridge_stp off
        bridge_fd 0
        pre-up ip link add vxlan2 type vxlan id 2 dstport 4789 local 
203.0.113.1 nolearning
        pre-up ip link set dev vmbr2 address a2:ed:21:06:e7:48
        pre-up brctl addif vmbr2 vxlan2

quagga bgp config:
------------------


router bgp 65000
  bgp router-id 203.0.113.1
  no bgp default ipv4-unicast
  neighbor fabric peer-group
  neighbor fabric remote-as 65000
  neighbor fabric capability extended-nexthop
  ! BGP sessions with route reflectors or full mesh with all proxmox hosts or 
routers..
  neighbor 203.0.113.2 peer-group fabric
  neighbor 203.0.113.254 peer-group fabric
  !
  address-family evpn
   neighbor fabric activate
   advertise-all-vni
  exit-address-family
  !
  exit
!


host2 config
------------
/etc/network/interfaces
-----------------------
auto lo
iface lo inet loopback
        pre-up ip addr add 203.0.113.2/32 dev lo

auto vmbr1
iface vmbr1 inet static
        address 10.0.1.1/24
        bridge_ports vxlan1
        bridge_stp off
        bridge_fd 0
        pre-up ip link add vxlan1 type vxlan id 1 dstport 4789 local 
203.0.113.2 nolearning
        pre-up ip link set dev vmbr1 address a2:ed:21:06:e7:48
        pre-up brctl addif vmbr1 vxlan1

auto vmbr2
iface vmbr2 inet static
        address 10.0.2.1/24
        bridge_ports vxlan1
        bridge_stp off
        bridge_fd 0
        pre-up ip link add vxlan2 type vxlan id 2 dstport 4789 local 
203.0.113.2 nolearning
        pre-up ip link set dev vmbr2 address a2:ed:21:06:e7:48
        pre-up brctl addif vmbr2 vxlan2

quagga bgp config:
------------------

router bgp 65000
  bgp router-id 203.0.113.2
  no bgp default ipv4-unicast
  neighbor fabric peer-group
  neighbor fabric remote-as 65000
  neighbor fabric capability extended-nexthop
  ! BGP sessions with route reflectors or full mesh with all proxmox hosts or 
routers..
  neighbor 203.0.113.1 peer-group fabric
  neighbor 203.0.113.254 peer-group fabric
  !
  address-family evpn
   neighbor fabric activate
   advertise-all-vni
  exit-address-family
  !
  exit
!



Regards,

Alexandre






----- Mail original -----
De: "aderumier" <aderum...@odiso.com>
À: "dietmar" <diet...@proxmox.com>
Cc: "pve-devel" <pve-devel@pve.proxmox.com>
Envoyé: Vendredi 5 Janvier 2018 12:26:32
Objet: Re: [pve-devel] proxmox 2018 : add support for "virtual" network and 
network plugins ?

>>I think we basically have two kinds of networks: 
>> 
>>1.) local networks: 
>> 
>>This is what we already have in /etc/network/interface. Access to local 
>>network 
>>is 
>>usually restricted to admins. 
>> 
>>2.) virtual networks: 
>> 
>>Basically a linux bridge where we can connect VM to. One can connect such 
>>virtual network to local network: 
>> 
>>- directly (this is what we currently use for the firewall) 
>>- vlan 
>>- vxlan 
>> 
>>Or we can connect that bridge to some SDN. 
>> 
>>We can also add additional service to such virtual network: 
>> 
>>- SNAT, DNAT 
>>- Firewall 
>>- DHCP 
>>- Routing, ... 

Yes, I totally agreed with you. 




For vxlan with linux bridge, I have found very good documentation here: 

https://vincent.bernat.im/fr/blog/2017-vxlan-linux 
https://vincent.bernat.im/fr/blog/2017-vxlan-bgp-evpn 

(In french, sorry). 

But basically, we can: 

create a simple bridge with vxlan interface (1 bridge by vxlan) 

Host1 (10.0.0.1) 
------- 
ip link add vxlan100 type vxlan \ 
id 100 \ 
dstport 4789 \ 
local 10.0.0.1 \ 
group ff05::100 \ 
dev eth0 \ 
ttl 5 

# brctl addbr vmbr100 
# brctl addif vmbr100 vxlan100 


ip link add vxlan200 type vxlan \ 
id 100 \ 
dstport 4789 \ 
local 10.0.0.1 \ 
group ff05::100 \ 
dev eth0 \ 
ttl 5 

# brctl addbr vmbr200 
# brctl addif vmbr200 vxlan200 


Host2 (10.0.0.2) 
------- 
ip link add vxlan100 type vxlan \ 
id 100 \ 
dstport 4789 \ 
local 10.0.0.2 \ 
group ff05::100 \ 
dev eth0 \ 
ttl 5 

# brctl addbr vmbr100 
# brctl addif vmbr100 vxlan100 


ip link add vxlan200 type vxlan \ 
id 100 \ 
dstport 4789 \ 
local 10.0.0.2 \ 
group ff05::100 \ 
dev eth0 \ 
ttl 5 

# brctl addbr vmbr200 
# brctl addif vmbr200 vxlan200 


This simple setup use multicast to send arp requests to all vni. 
Can work with layer2 lan, but not across internet. 

Anoter mode, is to use unicast instead multicast 
---------------------------------------------------------- 
host1 
------ 
ip link add vxlan100 type vxlan \ 
id 100 \ 
dstport 4789 \ 
local 10.0.0.1 \ 
group ff05::100 \ 
dev eth0 \ 
ttl 5 
# bridge fdb append 00:00:00:00:00:00 dev vxlan100 dst 10.0.0.2 
# brctl addbr vmbr100 
# brctl addif vmbr100 vxlan100 


ip link add vxlan200 type vxlan \ 
id 100 \ 
dstport 4789 \ 
local 10.0.0.1 \ 
group ff05::100 \ 
dev eth0 \ 
ttl 5 
# bridge fdb append 00:00:00:00:00:00 dev vxlan100 dst 10.0.0.2 
# brctl addbr vmbr200 
# brctl addif vmbr200 vxlan200 

host2 
------ 
ip link add vxlan100 type vxlan \ 
id 100 \ 
dstport 4789 \ 
local 10.0.0.2 \ 
group ff05::100 \ 
dev eth0 \ 
ttl 5 
# bridge fdb append 00:00:00:00:00:00 dev vxlan100 dst 10.0.0.1 
# brctl addbr vmbr100 
# brctl addif vmbr100 vxlan100 


ip link add vxlan200 type vxlan \ 
id 100 \ 
dstport 4789 \ 
local 10.0.0.2 \ 
group ff05::100 \ 
dev eth0 \ 
ttl 5 
# bridge fdb append 00:00:00:00:00:00 dev vxlan100 dst 10.0.0.1 
# brctl addbr vmbr200 
# brctl addif vmbr200 vxlan200 


This works fine for small setup, as arp will be replicate in unicast to all vni 


So to avoid arp (for big network), we can disable learning on vni , and use a 
bgp daemon (bgp-evpn protocol) to sync the fbd 
host1: 
------- 
ip link add vxlan100 type vxlan 
id 100 \ 
dstport 4789 \ 
local 10.0.0.1 \ 
nolearning 

host2 
------- 
ip link add vxlan100 type vxlan 
id 100 \ 
dstport 4789 \ 
local 10.0.0.2 \ 
nolearning 

then quagga/or frr local on each host, to peer with others hosts or through bgp 
routes reflector. (see the doc) 



They are also description of manual fbd setup (could be done by a proxmox 
daemon, as we known the mac address of vms, but this will work only for 1 
proxmox cluster). 
They are examples in documentation with behaviour of docker libnetworkd and 
flannel. 


It could be great to have something easy to setup, without need to configure 
each host manually. 
for example, something like 
/etc/pve/network.conf: 

vxlanplugin: customer1 
vxlan 100 
underlay_network 10.0.0.0/8 

and in vm config: net0: virtio=....,network=customer1 

this will create the vmbr100 with vxlan100 interface and take the local ip of 
each host, do the unicast config if needed with all others hosts,.... 



De: "dietmar" <diet...@proxmox.com> 
À: "aderumier" <aderum...@odiso.com>, "pve-devel" <pve-devel@pve.proxmox.com> 
Envoyé: Jeudi 4 Janvier 2018 09:30:52 
Objet: Re: [pve-devel] proxmox 2018 : add support for "virtual" network and 
network plugins ? 

I think we basically have two kinds of networks: 

1.) local networks: 

This is what we already have in /etc/network/interface. Access to local network 
is 
usually restricted to admins. 

2.) virtual networks: 

Basically a linux bridge where we can connect VM to. One can connect such 
virtual network to local network: 

- directly (this is what we currently use for the firewall) 
- vlan 
- vxlan 

Or we can connect that bridge to some SDN. 

We can also add additional service to such virtual network: 

- SNAT, DNAT 
- Firewall 
- DHCP 
- Routing, ... 


> On January 2, 2018 at 3:04 PM Alexandre DERUMIER <aderum...@odiso.com> wrote: 
> I think we have 2 kind of setup: 
> 
> - basic local vswitch (bridge, ovs, snabwitch,....) : can be easily setup 
> with 
> systemd-network + some tap/eth plug/unplug scripts. 
> - bigger sdn setup, with external controllers. (which could manage networks 
> across multiple proxmox clusters too) 
_______________________________________________ 
pve-devel mailing list 
pve-devel@pve.proxmox.com 
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
_______________________________________________
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Reply via email to