Tue, Aug 26, 2014 at 03:50:21PM CEST, ro...@cumulusnetworks.com wrote: >On 8/25/14, 3:50 PM, Thomas Graf wrote: >>On 08/25/14 at 12:15pm, Jamal Hadi Salim wrote: >>>On 08/25/14 10:17, Thomas Graf wrote: >>>>On 08/25/14 at 09:53am, Jamal Hadi Salim wrote: >>>>fdb_add() *is* flow based. At least in my understanding, the whole >>>>point here is to extend the idea of fdb_add() and make it understand >>>>L2-L4 in a more generic way for the most common protocols. >>>> >>>>The reason fdb_add() is not reused is because it is Netlink specific >>>>and only suitable for User -> HW offload. Kernel -> HW offload is >>>>technically possible but not clean. >>>> >>>I dont think we have a problem handling any of this today. >>Yes we do. It's restricted to L2 and we can't extend it easily >>because it is based on NDA_*. The use of Netlink makes in-kernel >>usage a pain. To me this is the sole reason for not using fdb_add() >>in the first place. It seems absolutely clear though that fdb_add() >>should be removed after the more generic ndo is in place providing >>a superset of what fdb_add() can do today. >> >>>This is where our (shall i say strong) disagreement is. >>>I think you will find it non-trivial to show me how you can >>>actually take the simple L2 bridge and map it to a "flow". >>>Since your starting point is "everything can be represented via a flow >>>and some table" - we are at a crosspath. >>OK, let me do the convertion for you: >> >>NDA_DST unused >>NDA_LLADDR sw_flow_key.eth.dst >>NDA_CACHEINFO unused >>NDA_PROBES unused >>NDA_VLAN sw_flow_key.eth.tci >>NDA_PORT unused >>NDA_VNI sw_flow_key.tun_key.tun_id >>NDA_IFINDEX sw_flow_key.phys.in_port >>NDA_MASTER unused >> >>>The tc filter API seems to be doing just that. >>>You have different types of classifiers - the h/w may not be able >>>to support some classifier types - but that is a capability discovery >>>challenge. >>Agreed but tc is only one out of many possible existing interfaces >>we have. macvtap (given we want to extend beyond L2), routing, >>OVS, bridge and eventually even things like a team device can and >>should make use of offloads. >> >>>I am saying two things: >>>1) There are a few "fundamental" interfaces; L2 and L3 being some. >>>Add crypto offload and a few i mentioned in my presentation. We >>Can you share that preso? I was not present. >> >>>know how to do those. example; there is nothing i cant do with >>>the rtmsg that is L3. or the fdb/port/vlan filter for L2. >>>This flow thing should stay out of those. >>Let me remind you about the name of the structure behind all L3 >>forwarding decisions: >> >> struct flowi4 { >> [...] >> } >> >>Adding a route means adding a flow. Can we please stop the flow >>bashing? The concept of a flow is very generic, well known and already >>very present in the kernel. >> >>The sw_flow_key proposed comes close to flowi4. Some fields are >>different. They can eventually get merged. The strict IPv4/IPv6 >>separation is what makes it non obvious and probably why Jiri chose >>the OVS representation. If you say rtmsg is complete then that clearly >>is not the case. In particular VTEP fields, ARP, and TCP flags are >>clearly missing for many uses. >> >>Again, I'm not saying flow is the ultimate answer to everything. It >>is not. But a lot of hardware out there is aware of flows in combination >>with some form of action execution. Non flow based hardware can have >>their own classifier. >> >>>2) The flow thing should allow a variety of classifiers to be >>>handled. Again capability discovery would take care of differences. >>So you want the flow to represent something that is not a flow. Again, >>this comes back to the conversation in the other email. If this is >>all about having a single ndo I'm sure we can find common grounds on >>that. > >From what i understood (trying to summarize here for my own benefit): >the switchdev api currently under review proposes every switch asic offload >abstraction as a flow. >It does not mandate this via code, however, there seems to be some discussion >along those lines. > >The switchdev api flow ndo's need to stay for switch asic drivers that >support flows directly or >possibly want all their hw offload abstraction to be represented by the flow >abstraction (openvswitch, the rocker dev ). The details of how the flow is >mapped to hw lies in the corresponding switch driver code.
Nod. > >We think rtnetlink is the api to model switch asic hw tables. >We have a working model (Cumulus) that maps rtnetlink to switch >asic hw tables (via snooping rtnetlink msgs). This can be done by extending >the switchdev api >with new ndo's for l2 and l3. > >Example: > new switchdev ndo's for fdb_add/fdb_del > new switchdev ndo's for l3 Nod. > >Now we only need working patches that implement switchdev api ndo ops for >l2/l3 (this is in the works). > >As long as the current patches under review allow the extension of the api to >cover non-flow based l2/l3 switch asic offloads, we might be good (?). Yes. Flows are phase one. The api will be extended in for whatever is needed for l2/l3 as you said. Also I see a possibility to implement the l2/l3 use case with flows as well. But generally, as stands for ever in-kernel api, we can extend it and change it. > > > >-- >To unsubscribe from this list: send the line "unsubscribe netdev" in >the body of a message to majord...@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev