Hi, On 12/28/18 5:43 PM, Ivan Kudryavtsev wrote: > Wido, that's interesting. > > Do you think that the Cumulus-based switches with BGP inside have > advantage over classic OSPF-based routing switches and separate multihop > MP BGP route-servers for VNI propagation? >
I don't know. We do not use OSFP anywhere in our netwerk. We are a (i)BGP network only. We want to use as much Open Software as possible. Buy switches we like and then add ONIE based Software like Cumulus. > I'm thinking about pure L3 OSPF-based backend networks for management > and storage where cloudstack uses bridges on dummy interfaces with IP > assigned while real NICS use utility IP-addresses in several OSPF > networks and all those target IPs are distributed with OSPF. > > Next, VNI-s are created over bridges and their information is > distributed over BGP. > > This approach helps to implement fault tolerance and multi-path routes > with standard L3 stack without xSTP, VCS, etc, decrease broadcast domains. > > Any thoughts? > I wouldn't know for sure, we haven't looked into this yet. Again, our plan, but not set in stone is: - Unnumbered BGP (IPv6 Link Local) to all Hypervisors - Link balancing using ECMP - BGP+EVPN for VXLAN VNI distribution - Use a static VNI for CloudStack POD IPv4 - Adapt the *modifyvxlan.sh* script to suit our needs This way the transport of traffic will be all be done in a IPv6 only fashion. IPv4 to the hypervisors (POD Traffic and NFS SS) is all done by a VXLAN device we create manually on them. Wido > > пт, 28 дек. 2018 г. в 05:34, Wido den Hollander <w...@widodh.nl > <mailto:w...@widodh.nl>>: > > > > On 10/23/18 2:54 PM, Ivan Kudryavtsev wrote: > > Doesn't solution like this works seamlessly for large VXLAN networks? > > > > https://vincent.bernat.ch/en/blog/2017-vxlan-bgp-evpn > > > > This is what we are looking into right now. > > As CloudStack executes *modifyvxlan.sh* prior to starting an Instance it > would be just a matter of replacing this script with a version which > does the EVPN for us. > > Our routers will probably be 36x100G SuperMicro Bare Matel switches > running Cumulus. > > Using unnumbered BGP over IPv6 we'll provide network connectivity to the > Hypervisors. > > Using FFR and EVPN we'll be able to enable VXLAN on the hypervisors and > route traffic. > > As these things seem to be very use-case specific I don't see how we can > integrate this into CloudStack in a generic way. > > The *modifyvxlan.sh* script gets the VNI as a argument, so anybody can > adapt it to their own needs for their specific environment. > > Wido > > > вт, 23 окт. 2018 г., 8:34 Simon Weller <swel...@ena.com.invalid>: > > > >> Linux native VXLAN uses multicast and each host has to participate in > >> multicast in order to see the VXLAN networks. We haven't tried > using PIM > >> across a L3 boundary with ACS, although it will probably work fine. > >> > >> Another option is to use a L3 VTEP, but right now there is no native > >> support for that in CloudStack's VXLAN implementation, although we've > >> thought about proposing it as feature. > >> > >> > >> ________________________________ > >> From: Wido den Hollander <w...@widodh.nl <mailto:w...@widodh.nl>> > >> Sent: Tuesday, October 23, 2018 7:17 AM > >> To: dev@cloudstack.apache.org <mailto:dev@cloudstack.apache.org>; > Simon Weller > >> Subject: Re: VXLAN and KVm experiences > >> > >> > >> > >> On 10/23/18 1:51 PM, Simon Weller wrote: > >>> We've also been using VXLAN on KVM for all of our isolated VPC guest > >> networks for quite a long time now. As Andrija pointed out, make > sure you > >> increase the max_igmp_memberships param and also put an ip > address on each > >> interface host VXLAN interface in the same subnet for all hosts > that will > >> share networking, or multicast won't work. > >>> > >> > >> Thanks! So you are saying that all hypervisors need to be in the > same L2 > >> network or are you routing the multicast? > >> > >> My idea was that each POD would be an isolated Layer 3 domain and > that a > >> VNI would span over the different Layer 3 networks. > >> > >> I don't like STP and other Layer 2 loop-prevention systems. > >> > >> Wido > >> > >>> > >>> - Si > >>> > >>> > >>> ________________________________ > >>> From: Wido den Hollander <w...@widodh.nl <mailto:w...@widodh.nl>> > >>> Sent: Tuesday, October 23, 2018 5:21 AM > >>> To: dev@cloudstack.apache.org <mailto:dev@cloudstack.apache.org> > >>> Subject: Re: VXLAN and KVm experiences > >>> > >>> > >>> > >>> On 10/23/18 11:21 AM, Andrija Panic wrote: > >>>> Hi Wido, > >>>> > >>>> I have "pioneered" this one in production for last 3 years (and > >> suffered a > >>>> nasty pain of silent drop of packages on kernel 3.X back in the > days > >>>> because of being unaware of max_igmp_memberships kernel > parameters, so I > >>>> have updated the manual long time ago). > >>>> > >>>> I never had any issues (beside above nasty one...) and it works > very > >> well. > >>> > >>> That's what I want to hear! > >>> > >>>> To avoid above issue that I described - you should increase > >>>> max_igmp_memberships (/proc/sys/net/ipv4/igmp_max_memberships) - > >> otherwise > >>>> with more than 20 vxlan interfaces, some of them will stay in > down state > >>>> and have a hard traffic drop (with proper message in agent.log) > with > >> kernel > >>>>> 4.0 (or I silent, bitchy random packet drop on kernel 3.X...) > - and > >> also > >>>> pay attention to MTU size as well - anyway everything is in the > manual > >> (I > >>>> updated everything I though was missing) - so please check it. > >>>> > >>> > >>> Yes, the underlying network will all be 9000 bytes MTU. > >>> > >>>> Our example setup: > >>>> > >>>> We have i.e. bond.950 as the main VLAN which will carry all vxlan > >> "tunnels" > >>>> - so this is defined as KVM traffic label. In our case it > didn't make > >> sense > >>>> to use bridge on top of this bond0.950 (as the traffic label) - > you can > >>>> test it on your own - since this bridge is used only to extract > child > >>>> bond0.950 interface name, then based on vxlan ID, ACS will > provision > >>>> vxlan...@bond0.xxx and join this new vxlan interface to NEW bridge > >> created > >>>> (and then of course vNIC goes to this new bridge), so original > bridge > >> (to > >>>> which bond0.xxx belonged) is not used for anything. > >>>> > >>> > >>> Clear, I indeed thought something like that would happen. > >>> > >>>> Here is sample from above for vxlan 867 used for tenant isolation: > >>>> > >>>> root@hostname:~# brctl show brvx-867 > >>>> > >>>> bridge name bridge id STP enabled interfaces > >>>> brvx-867 8000.2215cfce99ce no > vnet6 > >>>> > >>>> vxlan867 > >>>> > >>>> root@hostname:~# ip -d link show vxlan867 > >>>> > >>>> 297: vxlan867: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8142 qdisc > noqueue > >>>> master brvx-867 state UNKNOWN mode DEFAULT group default qlen 1000 > >>>> link/ether 22:15:cf:ce:99:ce brd ff:ff:ff:ff:ff:ff > promiscuity 1 > >>>> vxlan id 867 group 239.0.3.99 dev bond0.950 port 0 0 ttl 10 > ageing > >> 300 > >>>> > >>>> root@ix1-c7-2:~# ifconfig bond0.950 | grep MTU > >>>> UP BROADCAST RUNNING MULTICAST MTU:8192 Metric:1 > >>>> > >>>> So note how the vxlan interface has by 50 bytes smaller MTU > than the > >>>> bond0.950 parent interface (which could affects traffic inside > VM) - so > >>>> jumbo frames are needed anyway on the parent interface (bond.950 in > >> example > >>>> above with minimum of 1550 MTU) > >>>> > >>> > >>> Yes, thanks! We will be using 1500 MTU inside the VMs, so all the > >>> networks underneath will be ~9k. > >>> > >>>> Ping me if more details needed, happy to help. > >>>> > >>> > >>> Awesome! We'll be doing a PoC rather soon. I'll come back with our > >>> experiences later. > >>> > >>> Wido > >>> > >>>> Cheers > >>>> Andrija > >>>> > >>>> On Tue, 23 Oct 2018 at 08:23, Wido den Hollander > <w...@widodh.nl <mailto:w...@widodh.nl>> > >> wrote: > >>>> > >>>>> Hi, > >>>>> > >>>>> I just wanted to know if there are people out there using KVM with > >>>>> Advanced Networking and using VXLAN for different networks. > >>>>> > >>>>> Our main goal would be to spawn a VM and based on the network > the NIC > >> is > >>>>> in attach it to a different VXLAN bridge on the KVM host. > >>>>> > >>>>> It seems to me that this should work, but I just wanted to > check and > >> see > >>>>> if people have experience with it. > >>>>> > >>>>> Wido > >>>>> > >>>> > >>>> > >>> > >> > > > > > > -- > With best regards, Ivan Kudryavtsev > Bitworks LLC > Cell RU: +7-923-414-1515 > Cell USA: +1-201-257-1512 > WWW: http://bitworks.software/ <http://bw-sw.com/> >