On Aug 26, 2014, at 6:50 AM, Roopa Prabhu <ro...@cumulusnetworks.com> wrote:
> On 8/25/14, 3:50 PM, Thomas Graf wrote: >> On 08/25/14 at 12:15pm, Jamal Hadi Salim wrote: >>> On 08/25/14 10:17, Thomas Graf wrote: >>>> On 08/25/14 at 09:53am, Jamal Hadi Salim wrote: >>>> fdb_add() *is* flow based. At least in my understanding, the whole >>>> point here is to extend the idea of fdb_add() and make it understand >>>> L2-L4 in a more generic way for the most common protocols. >>>> >>>> The reason fdb_add() is not reused is because it is Netlink specific >>>> and only suitable for User -> HW offload. Kernel -> HW offload is >>>> technically possible but not clean. >>>> >>> I dont think we have a problem handling any of this today. >> Yes we do. It's restricted to L2 and we can't extend it easily >> because it is based on NDA_*. The use of Netlink makes in-kernel >> usage a pain. To me this is the sole reason for not using fdb_add() >> in the first place. It seems absolutely clear though that fdb_add() >> should be removed after the more generic ndo is in place providing >> a superset of what fdb_add() can do today. >> >>> This is where our (shall i say strong) disagreement is. >>> I think you will find it non-trivial to show me how you can >>> actually take the simple L2 bridge and map it to a "flow". >>> Since your starting point is "everything can be represented via a flow >>> and some table" - we are at a crosspath. >> OK, let me do the convertion for you: >> >> NDA_DST unused >> NDA_LLADDR sw_flow_key.eth.dst >> NDA_CACHEINFO unused >> NDA_PROBES unused >> NDA_VLAN sw_flow_key.eth.tci >> NDA_PORT unused >> NDA_VNI sw_flow_key.tun_key.tun_id >> NDA_IFINDEX sw_flow_key.phys.in_port >> NDA_MASTER unused >> >>> The tc filter API seems to be doing just that. >>> You have different types of classifiers - the h/w may not be able >>> to support some classifier types - but that is a capability discovery >>> challenge. >> Agreed but tc is only one out of many possible existing interfaces >> we have. macvtap (given we want to extend beyond L2), routing, >> OVS, bridge and eventually even things like a team device can and >> should make use of offloads. >> >>> I am saying two things: >>> 1) There are a few "fundamental" interfaces; L2 and L3 being some. >>> Add crypto offload and a few i mentioned in my presentation. We >> Can you share that preso? I was not present. >> >>> know how to do those. example; there is nothing i cant do with >>> the rtmsg that is L3. or the fdb/port/vlan filter for L2. >>> This flow thing should stay out of those. >> Let me remind you about the name of the structure behind all L3 >> forwarding decisions: >> >> struct flowi4 { >> [...] >> } >> >> Adding a route means adding a flow. Can we please stop the flow >> bashing? The concept of a flow is very generic, well known and already >> very present in the kernel. >> >> The sw_flow_key proposed comes close to flowi4. Some fields are >> different. They can eventually get merged. The strict IPv4/IPv6 >> separation is what makes it non obvious and probably why Jiri chose >> the OVS representation. If you say rtmsg is complete then that clearly >> is not the case. In particular VTEP fields, ARP, and TCP flags are >> clearly missing for many uses. >> >> Again, I'm not saying flow is the ultimate answer to everything. It >> is not. But a lot of hardware out there is aware of flows in combination >> with some form of action execution. Non flow based hardware can have >> their own classifier. >> >>> 2) The flow thing should allow a variety of classifiers to be >>> handled. Again capability discovery would take care of differences. >> So you want the flow to represent something that is not a flow. Again, >> this comes back to the conversation in the other email. If this is >> all about having a single ndo I'm sure we can find common grounds on >> that. > > From what i understood (trying to summarize here for my own benefit): > the switchdev api currently under review proposes every switch asic offload > abstraction as a flow. > It does not mandate this via code, however, there seems to be some discussion > along those lines. > > The switchdev api flow ndo's need to stay for switch asic drivers that > support flows directly or > possibly want all their hw offload abstraction to be represented by the flow > abstraction (openvswitch, the rocker dev ). The details of how the flow is > mapped to hw lies in the corresponding switch driver code. > > We think rtnetlink is the api to model switch asic hw tables. > We have a working model (Cumulus) that maps rtnetlink to switch > asic hw tables (via snooping rtnetlink msgs). This can be done by extending > the switchdev api > with new ndo's for l2 and l3. > I don’t see it that way. I believe sw_flow can be the intermediary representation to span flow-based and non-flow-based HW, and from flow-based world and traditional l2/l3 world. > Example: > new switchdev ndo's for fdb_add/fdb_del > new switchdev ndo's for l3 > > Now we only need working patches that implement switchdev api ndo ops for > l2/l3 (this is in the works). > > As long as the current patches under review allow the extension of the api to > cover non-flow based l2/l3 switch asic offloads, we might be good (?). > > Thanks, > Roopa > > > -scott _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev