Hi Daniel,

On Tue, Mar 16, 2021, at 15:19, Daniel Alvarez Sanchez wrote:
> 
> 
> On Tue, Mar 16, 2021 at 2:45 PM Luis Tomas Bolivar <ltoma...@redhat.com> 
> wrote:
> > Of course we are fully open to redesign it if there is a better approach! 
> > And that was indeed the intention when linking to the current efforts, 
> > figure out if that was a "valid" way of doing it, and how it can be 
> > improved/redesigned. The main idea behind the current design was not to 
> > need modifications to core OVN as well as to minimize the complexity, i.e., 
> > not having to implement another kind of controller for managing the extra 
> > OF flows.
> > 
> > Regarding the metadata/localport, I have a couple of questions, mainly due 
> > to me not knowing enough about ovn/localport:
> > 1) Isn't the metadata managed through a namespace? And the end of the day 
> > that is also visible from the hypervisor, as well as the OVS bridges
> > 2) Another difference is that we are using BGP ECMP and therefore not 
> > associating any nic/bond to br-ex, and that is why we require some 
> > rules/routes to redirect the traffic to br-ex.
> > 
> > Thanks for your input! Really appreciated!
> > 
> > Cheers,
> > Luis
> > 
> > On Tue, Mar 16, 2021 at 2:22 PM Krzysztof Klimonda 
> > <kklimo...@syntaxhighlighted.com> wrote:
> >> __
> >> Would it make more sense to reverse this part of the design? I was 
> >> thinking of having each chassis its own IPv4/IPv6 address used for 
> >> next-hop in announcements and OF flows installed to direct BGP control 
> >> packets over to the host system, in a similar way how localport is used 
> >> today for neutron's metadata service (although I'll admit that I haven't 
> >> looked into how this integrates with dpdk and offload).
> 
> Hi Krzysztof, not sure I follow your suggestion but let me see if I do. 
> With this PoC, the kernel will do:
> 
> 1) Routing to/from physical interface to OVN
> 2) Proxy ARP
> 3) Proxy NDP
> 
> Also FRR will advertise directly connected routes based on the IPs 
> configured on dummy interfaces.
> All this comes with the benefit that no changes are required in the CMS 
> or OVN itself.
> 
> If I understand your proposal well, you would like to do 1), 2) and 3) 
> in OpenFlow so an agent running on all compute nodes is going to be 
> responsible for this? Or you propose adding extra OVN resources in a 
> similar way to what ovn-kubernetes does today [0] and in this case:

Yes, that seems to be prerequisite (or one of prerequisites) for keeping 
current DPDK / offload capabilities, as far as I understand. By Proxy ARP/NDP I 
think you mean responding to ARP and NDP on behalf of the system where FRR is 
running?  

As for whether to go ovn-kubernetes way and try to implement it with existing 
primitives, or add BGP support directly into OVN, I feel like this should be a 
core feature of OVN itself and not something that could be built on top of it 
by a careful placement of logical switches, routers and ports. This would also 
help with management (you would configure new BGP connection by modifying 
northbound DB) and simplify troubleshooting in case something is not working as 
expected.

> 
> - Create an OVN Gateway router and connect it to the provider Logical 
> Switch
> - Advertise host routes through the Gateway Router IP address for each 
> node. This would consume one IP address per provider network per node

That seems excessive - why would we need one IP address per provider network 
per node? Shouldn't single IP per node be enough even if we go with your 
proposal of reusing existing OVN resources? If we do that, separate "service 
subnets" could be used per "external network" that provide connectivity between 
BGP router and OVN chassis (so that next hop can be configured correctly). 
Burning IP addresses from all provider networks seems excessive, given that 
some of them are going to be public and those are getting pretty expensive at 
the moment.

> - Some external entity to configure ECMP routing to the ToRs

(we're still talking about implementing it via neutron CMS, right?)
This is probably out of scope for the OVN or neutron anyway? I'd assume ToRs 
are configured before the compute node is deployed.

> - Who creates/manages the infra resources? Onboarding new hypervisors 
> requires IPAM and more

Right, that seems to be another reason to do that "natively" in OVN.

> - OpenStack provides flexibility to its users to customize their own 
> networking (more than ovn-kubernetes I believe). Mixing user created 
> network resources with infra resources in the same OVN cluster is non 
> trivial (eg. maintenance tasks, migration to OVN, ...)

I'm not sure I follow, but if you mean that in the second scenario (where BGP 
support is implemented using existing OVN resources by strategic placement of 
the LSs etc.) too much of the "infra" topology becomes visible to neutron (and 
possibly neutron users) then I wholeheartedly agree - I think this is not the 
way to implement that, and implementation should be done entirely on the OVN 
side.

> - Scaling issues due to the larger number of resources/flows?

Right, that's also one of my concerns when we talk about implementing this with 
current resources - it seems we'd have to create a decent number of resources 
per chassis, and that translates into extra flows in ovs (I also worry about 
processing of new packets, that would probably extend a number of flow lookups)

> 
> [0] 
> https://raw.githubusercontent.com/ovn-org/ovn-kubernetes/master/docs/design/current_ovn_topology.svg
> 
> >> This way we can also simplify host's networking configuration as extra 
> >> routing rules and arp entries are no longer needed (I think it would be 
> >> preferable, from security perspective, for hypervisor to not have a direct 
> >> access to overlay networks which seems to be the case when you use rules 
> >> like that).
> 
> I agree with the fact that it'd simplify the host networking but will 
> overcomplicate the rest (unless I'm missing something which is more 
> than possible :)

Yes, indeed - that's definitely adding a lot of work to be done on OVN side, 
but I think the result would be a more coherent system that could even be 
extended later - we could for example try implementing A/A load balancing by 
announcing same IP address from multiple chassis.

I think BGP implementation is of most use for larger deployments that already 
face issues with current OVN implementation - be it OF flow explosion in 
ovs-vswitchd, or having to process tons of ARP requests getting to the chassis, 
or trying to scale per-flow (and total) throughput of faster NICs.

> 
> Thanks a lot for the discussion,
> Daniel
>  
> >> 
> >> --
> >>   Krzysztof Klimonda
> >>   kklimo...@syntaxhighlighted.com
> >> 
> >> 
> >> 
> >> On Tue, Mar 16, 2021, at 13:56, Luis Tomas Bolivar wrote:
> >>> Hi Krzysztof,
> >>> 
> >>> On Tue, Mar 16, 2021 at 12:54 PM Krzysztof Klimonda 
> >>> <kklimo...@syntaxhighlighted.com> wrote:
> >>>> __
> >>>> Hi Luis,
> >>>> 
> >>>> I haven't yet had time to give it a try in our lab, but from reading 
> >>>> your blog posts I have a quick question. How does it work when either 
> >>>> DPDK or NIC offload is used for OVN traffic? It seems you are 
> >>>> (de-)encapsulating traffic on chassis nodes by routing them through 
> >>>> kernel - is this current design or just an artifact of PoC code?
> >>> 
> >>> You are correct, that is a limitation as we are using kernel routing for 
> >>> N/S traffic, so DPDK/NIC offloading could not be used. That said, the E/W 
> >>> traffic still uses the OVN overlay and Geneve tunnels.
> >>> 
> >>> 
> >>>> 
> >>>> 
> >>>> --
> >>>>   Krzysztof Klimonda
> >>>>   kklimo...@syntaxhighlighted.com
> >>>> 
> >>>> 
> >>>> 
> >>>> On Mon, Mar 15, 2021, at 11:29, Luis Tomas Bolivar wrote:
> >>>>> Hi Sergey, all,
> >>>>> 
> >>>>> In fact we are working on a solution based on FRR where a (python) 
> >>>>> agent reads from OVN SB DB (port binding events) and triggers FRR so 
> >>>>> that the needed routes gets advertised. It leverages kernel networking 
> >>>>> to redirect the traffic to the OVN overlay, and therefore does not 
> >>>>> require any modifications to ovn itself (at least for now). The PoC 
> >>>>> code can be found here: https://github.com/luis5tb/bgp-agent
> >>>>> 
> >>>>> And there is a series of blog posts related to how to use it on 
> >>>>> OpenStack and how it works:
> >>>>> - OVN-BGP agent introduction: 
> >>>>> https://ltomasbo.wordpress.com/2021/02/04/openstack-networking-with-bgp/
> >>>>> - How to set ip up on DevStack Environment: 
> >>>>> https://ltomasbo.wordpress.com/2021/02/04/ovn-bgp-agent-testing-setup/
> >>>>> - In-depth traffic flow inspection: 
> >>>>> https://ltomasbo.wordpress.com/2021/02/04/ovn-bgp-agent-in-depth-traffic-flow-inspection/
> >>>>> 
> >>>>> We are thinking that possible next steps if community is interested 
> >>>>> could be related to adding multitenancy support (e.g., through EVPN), 
> >>>>> as well as defining what could be the best API to decide what to expose 
> >>>>> through BGP. It would be great to get some feedback on it!
> >>>>> 
> >>>>> Cheers,
> >>>>> Luis
> >>>>> 
> >>>>> On Fri, Mar 12, 2021 at 8:09 PM Dan Sneddon <dsned...@redhat.com> wrote:
> >>>>>> 
> >>>>>> 
> >>>>>> On 3/10/21 2:09 PM, Sergey Chekanov wrote:
> >>>>>> > I am looking to Gobgp (BGP implementation in Go) + go-openvswitch 
> >>>>>> > for 
> >>>>>> > communicate with OVN Northbound Database right now, but not sure yet.
> >>>>>> > FRR I think will be too heavy for it...
> >>>>>> > 
> >>>>>> > On 10.03.2021 05:05, Raymond Burkholder wrote:
> >>>>>> >> You could look at it from a Free Range Routing perspective.  I've 
> >>>>>> >> used 
> >>>>>> >> it in combination with OVS for layer 2 and layer 3 handling.
> >>>>>> >>
> >>>>>> >> On 3/8/21 3:40 AM, Sergey Chekanov wrote:
> >>>>>> >>> Hello!
> >>>>>> >>>
> >>>>>> >>> Is there are any plans for support BGP EVPN for extending virtual 
> >>>>>> >>> networks to ToR hardware switches?
> >>>>>> >>> Or why it is bad idea?
> >>>>>> >>>
> >>>>>> >>> _______________________________________________
> >>>>>> >>> discuss mailing list
> >>>>>> >>> disc...@openvswitch.org
> >>>>>> >>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> >>>>>> >>
> >>>>>> > 
> >>>>>> > _______________________________________________
> >>>>>> > discuss mailing list
> >>>>>> > disc...@openvswitch.org
> >>>>>> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> >>>>>> > 
> >>>>>> 
> >>>>>> FRR is delivered as a set of daemons which perform specific functions. 
> >>>>>> If you only need BGP functionality, you can just run bgpd. The zebra 
> >>>>>> daemon adds routing exchange between BGP and the kernel. The vtysh 
> >>>>>> daemon provides a command-line interface to interact with the FRR 
> >>>>>> processes. There is also a bi-directional forwarding detection (BFD) 
> >>>>>> daemon that can be run to detect unidirectional forwarding failures. 
> >>>>>> Other daemons provide other services and protocols. For this reason, I 
> >>>>>> felt that it was lightweight enough to just run a few daemons in a 
> >>>>>> container.
> >>>>>> 
> >>>>>> A secondary concern for my use case was support on Red Hat Enterprise 
> >>>>>> Linux, which will be adding FRR to the supported packages shortly.
> >>>>>> 
> >>>>>> I'm curious to hear any input that anyone has on FRR compared with 
> >>>>>> GoBGP 
> >>>>>> and other daemons. Please feel free to respond on-list if it involves 
> >>>>>> OVS, or off-list if not. Thanks.
> >>>>>> 
> >>>>>> -- 
> >>>>>> Dan Sneddon         |  Senior Principal Software Engineer
> >>>>>> dsned...@redhat.com |  redhat.com/cloud
> >>>>>> dsneddon:irc        |  @dxs:twitter
> >>>>>> 
> >>>>>> _______________________________________________
> >>>>>> discuss mailing list
> >>>>>> disc...@openvswitch.org
> >>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> >>>>> 
> >>>>> 
> >>>>> -- 
> >>>>> LUIS TOMÁS BOLÍVAR
> >>>>> Principal Software Engineer
> >>>>> Red Hat
> >>>>> Madrid, Spain
> >>>>> ltoma...@redhat.com   
> >>>>>  
> >>>>> _______________________________________________
> >>>>> discuss mailing list
> >>>>> disc...@openvswitch.org
> >>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> >>>>> 
> >>>> 
> >>>> _______________________________________________
> >>>> discuss mailing list
> >>>> disc...@openvswitch.org
> >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> >>> 
> >>> 
> >>> -- 
> >>> LUIS TOMÁS BOLÍVAR
> >>> Principal Software Engineer
> >>> Red Hat
> >>> Madrid, Spain
> >>> ltoma...@redhat.com   
> >>>  
> >> 
> > 
> > 
> > -- 
> > LUIS TOMÁS BOLÍVAR
> > Principal Software Engineer
> > Red Hat
> > Madrid, Spain
> > ltoma...@redhat.com   
> >  
> > _______________________________________________
> > discuss mailing list
> > disc...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

-- 
  Krzysztof Klimonda
  kklimo...@syntaxhighlighted.com
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to