Re: VXLAN and KVm experiences

Wido den Hollander Thu, 18 Jul 2019 05:06:31 -0700

On 10/23/18 2:54 PM, Ivan Kudryavtsev wrote:
> Doesn't solution like this works seamlessly for large VXLAN networks?
> 
> https://vincent.bernat.ch/en/blog/2017-vxlan-bgp-evpn

We are using that with CloudStack right now. We have a modified version
of 'modifyvxlan.sh':
https://github.com/PCextreme/cloudstack/blob/vxlan-bgp-evpn/scripts/vm/network/vnet/modifyvxlan.sh

Your 'tunnelip' needs to be set on 'lo', in our case this is
10.255.255.255.X

We have the script in /usr/share/modifyvxlan.sh so that it's found by
the Agent and we don't overwrite the existing script (which might break
after a upgrade).

Our frr conf on the hypervisor:

frr version 7.1
frr defaults traditional
hostname myfirsthypervisor
log syslog informational
no ipv6 forwarding
service integrated-vtysh-config
!
interface enp81s0f0
 no ipv6 nd suppress-ra
!
interface enp81s0f1
 no ipv6 nd suppress-ra
!
interface lo
 ip address 10.255.255.9/32
 ipv6 address 2001:db8:100::9/128
!
router bgp 4200100123
 bgp router-id 10.255.255.9
 no bgp default ipv4-unicast
 neighbor uplinks peer-group
 neighbor uplinks remote-as external
 neighbor uplinks ebgp-multihop 255
 neighbor enp81s0f0 interface peer-group uplinks
 neighbor enp81s0f1 interface peer-group uplinks
 !
 address-family ipv4 unicast
  network 10.255.255.9/32
  neighbor uplinks activate
  neighbor uplinks next-hop-self
 exit-address-family
 !
 address-family ipv6 unicast
  network 2001:db8:100::9/128
  neighbor uplinks activate
 exit-address-family
 !
 address-family l2vpn evpn
  neighbor uplinks activate
  advertise-all-vni
 exit-address-family
!
line vty
!

Both enp81s0f0 and enp81s0f1 are 100G interfaces connected to Cumulus
Linux routers/switches and they use BGP Unnumbered (IPv6 Link Local) for
their BGP sessions.

Hope this helps!

Wido

> 
> вт, 23 окт. 2018 г., 8:34 Simon Weller <swel...@ena.com.invalid>:
> 
>> Linux native VXLAN uses multicast and each host has to participate in
>> multicast in order to see the VXLAN networks. We haven't tried using PIM
>> across a L3 boundary with ACS, although it will probably work fine.
>>
>> Another option is to use a L3 VTEP, but right now there is no native
>> support for that in CloudStack's VXLAN implementation, although we've
>> thought about proposing it as feature.
>>
>>
>> ________________________________
>> From: Wido den Hollander <w...@widodh.nl>
>> Sent: Tuesday, October 23, 2018 7:17 AM
>> To: dev@cloudstack.apache.org; Simon Weller
>> Subject: Re: VXLAN and KVm experiences
>>
>>
>>
>> On 10/23/18 1:51 PM, Simon Weller wrote:
>>> We've also been using VXLAN on KVM for all of our isolated VPC guest
>> networks for quite a long time now. As Andrija pointed out, make sure you
>> increase the max_igmp_memberships param and also put an ip address on each
>> interface host VXLAN interface in the same subnet for all hosts that will
>> share networking, or multicast won't work.
>>>
>>
>> Thanks! So you are saying that all hypervisors need to be in the same L2
>> network or are you routing the multicast?
>>
>> My idea was that each POD would be an isolated Layer 3 domain and that a
>> VNI would span over the different Layer 3 networks.
>>
>> I don't like STP and other Layer 2 loop-prevention systems.
>>
>> Wido
>>
>>>
>>> - Si
>>>
>>>
>>> ________________________________
>>> From: Wido den Hollander <w...@widodh.nl>
>>> Sent: Tuesday, October 23, 2018 5:21 AM
>>> To: dev@cloudstack.apache.org
>>> Subject: Re: VXLAN and KVm experiences
>>>
>>>
>>>
>>> On 10/23/18 11:21 AM, Andrija Panic wrote:
>>>> Hi Wido,
>>>>
>>>> I have "pioneered" this one in production for last 3 years (and
>> suffered a
>>>> nasty pain of silent drop of packages on kernel 3.X back in the days
>>>> because of being unaware of max_igmp_memberships kernel parameters, so I
>>>> have updated the manual long time ago).
>>>>
>>>> I never had any issues (beside above nasty one...) and it works very
>> well.
>>>
>>> That's what I want to hear!
>>>
>>>> To avoid above issue that I described - you should increase
>>>> max_igmp_memberships (/proc/sys/net/ipv4/igmp_max_memberships)  -
>> otherwise
>>>> with more than 20 vxlan interfaces, some of them will stay in down state
>>>> and have a hard traffic drop (with proper message in agent.log) with
>> kernel
>>>>> 4.0 (or I silent, bitchy random packet drop on kernel 3.X...) - and
>> also
>>>> pay attention to MTU size as well - anyway everything is in the manual
>> (I
>>>> updated everything I though was missing) - so please check it.
>>>>
>>>
>>> Yes, the underlying network will all be 9000 bytes MTU.
>>>
>>>> Our example setup:
>>>>
>>>> We have i.e. bond.950 as the main VLAN which will carry all vxlan
>> "tunnels"
>>>> - so this is defined as KVM traffic label. In our case it didn't make
>> sense
>>>> to use bridge on top of this bond0.950 (as the traffic label) - you can
>>>> test it on your own - since this bridge is used only to extract child
>>>> bond0.950 interface name, then based on vxlan ID, ACS will provision
>>>> vxlan...@bond0.xxx and join this new vxlan interface to NEW bridge
>> created
>>>> (and then of course vNIC goes to this new bridge), so original bridge
>> (to
>>>> which bond0.xxx belonged) is not used for anything.
>>>>
>>>
>>> Clear, I indeed thought something like that would happen.
>>>
>>>> Here is sample from above for vxlan 867 used for tenant isolation:
>>>>
>>>> root@hostname:~# brctl show brvx-867
>>>>
>>>> bridge name     bridge id               STP enabled     interfaces
>>>> brvx-867                8000.2215cfce99ce       no              vnet6
>>>>
>>>>      vxlan867
>>>>
>>>> root@hostname:~# ip -d link show vxlan867
>>>>
>>>> 297: vxlan867: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8142 qdisc noqueue
>>>> master brvx-867 state UNKNOWN mode DEFAULT group default qlen 1000
>>>>     link/ether 22:15:cf:ce:99:ce brd ff:ff:ff:ff:ff:ff promiscuity 1
>>>>     vxlan id 867 group 239.0.3.99 dev bond0.950 port 0 0 ttl 10 ageing
>> 300
>>>>
>>>> root@ix1-c7-2:~# ifconfig bond0.950 | grep MTU
>>>>           UP BROADCAST RUNNING MULTICAST  MTU:8192  Metric:1
>>>>
>>>> So note how the vxlan interface has by 50 bytes smaller MTU than the
>>>> bond0.950 parent interface (which could affects traffic inside VM) - so
>>>> jumbo frames are needed anyway on the parent interface (bond.950 in
>> example
>>>> above with minimum of 1550 MTU)
>>>>
>>>
>>> Yes, thanks! We will be using 1500 MTU inside the VMs, so all the
>>> networks underneath will be ~9k.
>>>
>>>> Ping me if more details needed, happy to help.
>>>>
>>>
>>> Awesome! We'll be doing a PoC rather soon. I'll come back with our
>>> experiences later.
>>>
>>> Wido
>>>
>>>> Cheers
>>>> Andrija
>>>>
>>>> On Tue, 23 Oct 2018 at 08:23, Wido den Hollander <w...@widodh.nl>
>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I just wanted to know if there are people out there using KVM with
>>>>> Advanced Networking and using VXLAN for different networks.
>>>>>
>>>>> Our main goal would be to spawn a VM and based on the network the NIC
>> is
>>>>> in attach it to a different VXLAN bridge on the KVM host.
>>>>>
>>>>> It seems to me that this should work, but I just wanted to check and
>> see
>>>>> if people have experience with it.
>>>>>
>>>>> Wido
>>>>>
>>>>
>>>>
>>>
>>
>
Re: VXLAN and KVm experiences

Reply via email to