On Aug 26, 2014, at 6:50 AM, Roopa Prabhu <ro...@cumulusnetworks.com> wrote:

> On 8/25/14, 3:50 PM, Thomas Graf wrote:
>> On 08/25/14 at 12:15pm, Jamal Hadi Salim wrote:
>>> On 08/25/14 10:17, Thomas Graf wrote:
>>>> On 08/25/14 at 09:53am, Jamal Hadi Salim wrote:
>>>> fdb_add() *is* flow based. At least in my understanding, the whole
>>>> point here is to extend the idea of fdb_add() and make it understand
>>>> L2-L4 in a more generic way for the most common protocols.
>>>> 
>>>> The reason fdb_add() is not reused is because it is Netlink specific
>>>> and only suitable for User -> HW offload. Kernel -> HW offload is
>>>> technically possible but not clean.
>>>> 
>>> I dont think we have a problem handling any of this today.
>> Yes we do. It's restricted to L2 and we can't extend it easily
>> because it is based on NDA_*. The use of Netlink makes in-kernel
>> usage a pain. To me this is the sole reason for not using fdb_add()
>> in the first place. It seems absolutely clear though that fdb_add()
>> should be removed after the more generic ndo is in place providing
>> a superset of what fdb_add() can do today.
>> 
>>> This is where our (shall i say strong) disagreement is.
>>> I think you will find it non-trivial to show me how you can
>>> actually take the simple L2 bridge and map it to a "flow".
>>> Since your starting point is "everything can be represented via a flow
>>> and some table" - we are at a crosspath.
>> OK, let me do the convertion for you:
>> 
>> NDA_DST              unused
>> NDA_LLADDR   sw_flow_key.eth.dst
>> NDA_CACHEINFO        unused
>> NDA_PROBES   unused
>> NDA_VLAN     sw_flow_key.eth.tci
>> NDA_PORT     unused
>> NDA_VNI              sw_flow_key.tun_key.tun_id
>> NDA_IFINDEX  sw_flow_key.phys.in_port
>> NDA_MASTER   unused
>> 
>>> The tc filter API seems to be doing just that.
>>> You have different types of classifiers - the h/w may not be able
>>> to support some classifier types - but that is a capability discovery
>>> challenge.
>> Agreed but tc is only one out of many possible existing interfaces
>> we have. macvtap (given we want to extend beyond L2), routing,
>> OVS, bridge and eventually even things like a team device can and
>> should make use of offloads.
>> 
>>> I am saying two things:
>>> 1) There are a few "fundamental" interfaces; L2 and L3 being some.
>>> Add crypto offload and a few i mentioned in  my presentation. We
>> Can you share that preso? I was not present.
>> 
>>> know how to do those. example; there is nothing i cant do with
>>> the rtmsg that is L3. or the fdb/port/vlan filter for L2.
>>> This flow thing should stay out of those.
>> Let me remind you about the name of the structure behind all L3
>> forwarding decisions:
>> 
>>         struct flowi4 {
>>              [...]
>>      }
>> 
>> Adding a route means adding a flow. Can we please stop the flow
>> bashing? The concept of a flow is very generic, well known and already
>> very present in the kernel.
>> 
>> The sw_flow_key proposed comes close to flowi4. Some fields are
>> different. They can eventually get merged. The strict IPv4/IPv6
>> separation is what makes it non obvious and probably why Jiri chose
>> the OVS representation. If you say rtmsg is complete then that clearly
>> is not the case. In particular VTEP fields, ARP, and TCP flags are
>> clearly missing for many uses.
>> 
>> Again, I'm not saying flow is the ultimate answer to everything. It
>> is not. But a lot of hardware out there is aware of flows in combination
>> with some form of action execution. Non flow based hardware can have
>> their own classifier.
>> 
>>> 2) The flow thing should allow a variety of classifiers to be
>>> handled. Again capability discovery would take care of differences.
>> So you want the flow to represent something that is not a flow. Again,
>> this comes back to the conversation in the other email. If this is
>> all about having a single ndo I'm sure we can find common grounds on
>> that.
> 
> From what i understood (trying to summarize here for my own benefit):
> the switchdev api currently under review proposes every switch asic offload 
> abstraction as a flow.
> It does not mandate this via code, however, there seems to be some discussion 
> along those lines.
> 
> The switchdev api flow ndo's need to stay for switch asic drivers that 
> support flows directly or
> possibly want all their hw offload abstraction to be represented by the flow 
> abstraction (openvswitch, the rocker dev ). The details of how the flow is 
> mapped to hw lies in the corresponding switch driver code.
> 
> We think rtnetlink is the api to model switch asic hw tables.
> We have a working model (Cumulus) that maps rtnetlink to switch
> asic hw tables (via snooping rtnetlink msgs). This can be done by extending 
> the switchdev api
> with new ndo's for l2 and l3.
> 

I don’t see it that way.  I believe sw_flow can be the intermediary 
representation to span flow-based and non-flow-based HW, and from flow-based 
world and traditional l2/l3 world.


> Example:
>  new switchdev ndo's for fdb_add/fdb_del
>  new switchdev ndo's for l3
> 
> Now we only need working patches that implement switchdev api ndo ops for 
> l2/l3 (this is in the works).
> 
> As long as the current patches under review allow the extension of the api to 
> cover non-flow based l2/l3 switch asic offloads, we might be good (?).
> 
> Thanks,
> Roopa
> 
> 
> 


-scott



_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to