Hi, we have done a POC at work, with vxlan + ebgp-vpn, and we have tested a very nice feature : anycast gateway.
basicaly, each vmbrX of a specific tenant, have the same ipaddress/mac address. This ip is the default gateway of the vm. That mean that vm can be migrate across hosts :) (openstack have a network model like this called "dvr", distributed virtual routing, vmware :nsx Distributed Logical Router (DLR)) I think this model works very fine with proxmox, as all proxmox nodes are master. This open a lot of possiblities: - distributed dhcp on all cluster nodes (btw, http://kea.isc.org seem to be a very good candidate, can be extended with custom backend) - distributed dns - S-nat for vm private only -> internet. (need 1 public ip on each hosts) - 1:1 nat (aka floating ip in openstack,google cloud,...) : a public ip is created on host, and "migrate" at the same time than a vm https://assafmuller.com/2015/04/15/distributed-virtual-routing-floating-ips/ - cloudinit metadata-server should be easy to implement - add vrf support to isolate host (avoid connect from vm to the host gateway ip), or prevent inter-routing between tenants - maybe other cool stuffs I don't have think yet about :) I really like the vxlan bgp evpn, because it's a standard, with no central controller, and we can also use it on physical switch/routers, differents proxmox clusters,and also on docker/kubernetes cluster. Also, this fully supported in current linux kernel. (cumulus network work hard on it, and use it on their switch). We only need : the linux kernel + a bgp routing deamon (quagga, frr, bird, gobgp,...) I can already configure manually each brige and vtep from each network in /etc/network/interfaces (with ifupdown2, https://github.com/CumulusNetworks/ifupdown2/tree/v1.0.0, already support new kernel 4.14 features). I have checked systemd-networkd, seem to be good, with some missing optional features like arp suppression (kernel 4.14) But this could be done easy with some custom code with ip commands. basicaly we need to define something like local on each node ------------------- /etc/pve/node/ host1: ------ vtep : myvtep dstport 4789 address 203.0.113.1 learningmode nolearning (each node have a loopback with this ip address, which is used to generate vtep and the local bgp config too from this) host2: vtep : myvtep dstport 4789 address 203.0.113.2 learningmode nolearning global ------ /etc/pve/networks.cfg vxlanebgp: tenantnetwork1 gateway_address 10.0.1.1/24 gateway_macaddress a2:ed:21:06:e7:48 vni 1 vtep myvtep vxlanebgp: tenantnetwork2 gateway_address 10.0.2.1/24 gateway_macaddress a2:ed:21:06:e7:48 vni 2 vtep myvtep Then when vm start, we can generate the bridge with anycast address, create a vtep and plug it on the bridge (differents vtep reuse the same loopback address) manual config: host1 config ------------ /etc/network/interfaces ----------------------- auto lo iface lo inet loopback pre-up ip addr add 203.0.113.1/32 dev lo auto vmbr1 iface vmbr1 inet static address 10.0.1.1/24 bridge_ports vxlan1 bridge_stp off bridge_fd 0 pre-up ip link add vxlan1 type vxlan id 1 dstport 4789 local 203.0.113.1 nolearning pre-up ip link set dev vmbr1 address a2:ed:21:06:e7:48 pre-up brctl addif vmbr1 vxlan1 auto vmbr2 iface vmbr2 inet static address 10.0.2.1/24 bridge_ports vxlan1 bridge_stp off bridge_fd 0 pre-up ip link add vxlan2 type vxlan id 2 dstport 4789 local 203.0.113.1 nolearning pre-up ip link set dev vmbr2 address a2:ed:21:06:e7:48 pre-up brctl addif vmbr2 vxlan2 quagga bgp config: ------------------ router bgp 65000 bgp router-id 203.0.113.1 no bgp default ipv4-unicast neighbor fabric peer-group neighbor fabric remote-as 65000 neighbor fabric capability extended-nexthop ! BGP sessions with route reflectors or full mesh with all proxmox hosts or routers.. neighbor 203.0.113.2 peer-group fabric neighbor 203.0.113.254 peer-group fabric ! address-family evpn neighbor fabric activate advertise-all-vni exit-address-family ! exit ! host2 config ------------ /etc/network/interfaces ----------------------- auto lo iface lo inet loopback pre-up ip addr add 203.0.113.2/32 dev lo auto vmbr1 iface vmbr1 inet static address 10.0.1.1/24 bridge_ports vxlan1 bridge_stp off bridge_fd 0 pre-up ip link add vxlan1 type vxlan id 1 dstport 4789 local 203.0.113.2 nolearning pre-up ip link set dev vmbr1 address a2:ed:21:06:e7:48 pre-up brctl addif vmbr1 vxlan1 auto vmbr2 iface vmbr2 inet static address 10.0.2.1/24 bridge_ports vxlan1 bridge_stp off bridge_fd 0 pre-up ip link add vxlan2 type vxlan id 2 dstport 4789 local 203.0.113.2 nolearning pre-up ip link set dev vmbr2 address a2:ed:21:06:e7:48 pre-up brctl addif vmbr2 vxlan2 quagga bgp config: ------------------ router bgp 65000 bgp router-id 203.0.113.2 no bgp default ipv4-unicast neighbor fabric peer-group neighbor fabric remote-as 65000 neighbor fabric capability extended-nexthop ! BGP sessions with route reflectors or full mesh with all proxmox hosts or routers.. neighbor 203.0.113.1 peer-group fabric neighbor 203.0.113.254 peer-group fabric ! address-family evpn neighbor fabric activate advertise-all-vni exit-address-family ! exit ! Regards, Alexandre ----- Mail original ----- De: "aderumier" <aderum...@odiso.com> À: "dietmar" <diet...@proxmox.com> Cc: "pve-devel" <pve-devel@pve.proxmox.com> Envoyé: Vendredi 5 Janvier 2018 12:26:32 Objet: Re: [pve-devel] proxmox 2018 : add support for "virtual" network and network plugins ? >>I think we basically have two kinds of networks: >> >>1.) local networks: >> >>This is what we already have in /etc/network/interface. Access to local >>network >>is >>usually restricted to admins. >> >>2.) virtual networks: >> >>Basically a linux bridge where we can connect VM to. One can connect such >>virtual network to local network: >> >>- directly (this is what we currently use for the firewall) >>- vlan >>- vxlan >> >>Or we can connect that bridge to some SDN. >> >>We can also add additional service to such virtual network: >> >>- SNAT, DNAT >>- Firewall >>- DHCP >>- Routing, ... Yes, I totally agreed with you. For vxlan with linux bridge, I have found very good documentation here: https://vincent.bernat.im/fr/blog/2017-vxlan-linux https://vincent.bernat.im/fr/blog/2017-vxlan-bgp-evpn (In french, sorry). But basically, we can: create a simple bridge with vxlan interface (1 bridge by vxlan) Host1 (10.0.0.1) ------- ip link add vxlan100 type vxlan \ id 100 \ dstport 4789 \ local 10.0.0.1 \ group ff05::100 \ dev eth0 \ ttl 5 # brctl addbr vmbr100 # brctl addif vmbr100 vxlan100 ip link add vxlan200 type vxlan \ id 100 \ dstport 4789 \ local 10.0.0.1 \ group ff05::100 \ dev eth0 \ ttl 5 # brctl addbr vmbr200 # brctl addif vmbr200 vxlan200 Host2 (10.0.0.2) ------- ip link add vxlan100 type vxlan \ id 100 \ dstport 4789 \ local 10.0.0.2 \ group ff05::100 \ dev eth0 \ ttl 5 # brctl addbr vmbr100 # brctl addif vmbr100 vxlan100 ip link add vxlan200 type vxlan \ id 100 \ dstport 4789 \ local 10.0.0.2 \ group ff05::100 \ dev eth0 \ ttl 5 # brctl addbr vmbr200 # brctl addif vmbr200 vxlan200 This simple setup use multicast to send arp requests to all vni. Can work with layer2 lan, but not across internet. Anoter mode, is to use unicast instead multicast ---------------------------------------------------------- host1 ------ ip link add vxlan100 type vxlan \ id 100 \ dstport 4789 \ local 10.0.0.1 \ group ff05::100 \ dev eth0 \ ttl 5 # bridge fdb append 00:00:00:00:00:00 dev vxlan100 dst 10.0.0.2 # brctl addbr vmbr100 # brctl addif vmbr100 vxlan100 ip link add vxlan200 type vxlan \ id 100 \ dstport 4789 \ local 10.0.0.1 \ group ff05::100 \ dev eth0 \ ttl 5 # bridge fdb append 00:00:00:00:00:00 dev vxlan100 dst 10.0.0.2 # brctl addbr vmbr200 # brctl addif vmbr200 vxlan200 host2 ------ ip link add vxlan100 type vxlan \ id 100 \ dstport 4789 \ local 10.0.0.2 \ group ff05::100 \ dev eth0 \ ttl 5 # bridge fdb append 00:00:00:00:00:00 dev vxlan100 dst 10.0.0.1 # brctl addbr vmbr100 # brctl addif vmbr100 vxlan100 ip link add vxlan200 type vxlan \ id 100 \ dstport 4789 \ local 10.0.0.2 \ group ff05::100 \ dev eth0 \ ttl 5 # bridge fdb append 00:00:00:00:00:00 dev vxlan100 dst 10.0.0.1 # brctl addbr vmbr200 # brctl addif vmbr200 vxlan200 This works fine for small setup, as arp will be replicate in unicast to all vni So to avoid arp (for big network), we can disable learning on vni , and use a bgp daemon (bgp-evpn protocol) to sync the fbd host1: ------- ip link add vxlan100 type vxlan id 100 \ dstport 4789 \ local 10.0.0.1 \ nolearning host2 ------- ip link add vxlan100 type vxlan id 100 \ dstport 4789 \ local 10.0.0.2 \ nolearning then quagga/or frr local on each host, to peer with others hosts or through bgp routes reflector. (see the doc) They are also description of manual fbd setup (could be done by a proxmox daemon, as we known the mac address of vms, but this will work only for 1 proxmox cluster). They are examples in documentation with behaviour of docker libnetworkd and flannel. It could be great to have something easy to setup, without need to configure each host manually. for example, something like /etc/pve/network.conf: vxlanplugin: customer1 vxlan 100 underlay_network 10.0.0.0/8 and in vm config: net0: virtio=....,network=customer1 this will create the vmbr100 with vxlan100 interface and take the local ip of each host, do the unicast config if needed with all others hosts,.... De: "dietmar" <diet...@proxmox.com> À: "aderumier" <aderum...@odiso.com>, "pve-devel" <pve-devel@pve.proxmox.com> Envoyé: Jeudi 4 Janvier 2018 09:30:52 Objet: Re: [pve-devel] proxmox 2018 : add support for "virtual" network and network plugins ? I think we basically have two kinds of networks: 1.) local networks: This is what we already have in /etc/network/interface. Access to local network is usually restricted to admins. 2.) virtual networks: Basically a linux bridge where we can connect VM to. One can connect such virtual network to local network: - directly (this is what we currently use for the firewall) - vlan - vxlan Or we can connect that bridge to some SDN. We can also add additional service to such virtual network: - SNAT, DNAT - Firewall - DHCP - Routing, ... > On January 2, 2018 at 3:04 PM Alexandre DERUMIER <aderum...@odiso.com> wrote: > I think we have 2 kind of setup: > > - basic local vswitch (bridge, ovs, snabwitch,....) : can be easily setup > with > systemd-network + some tap/eth plug/unplug scripts. > - bigger sdn setup, with external controllers. (which could manage networks > across multiple proxmox clusters too) _______________________________________________ pve-devel mailing list pve-devel@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel _______________________________________________ pve-devel mailing list pve-devel@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel