On Aug 26, 2014, at 6:50 AM, Roopa Prabhu <[email protected]> wrote:
> On 8/25/14, 3:50 PM, Thomas Graf wrote:
>> On 08/25/14 at 12:15pm, Jamal Hadi Salim wrote:
>>> On 08/25/14 10:17, Thomas Graf wrote:
>>>> On 08/25/14 at 09:53am, Jamal Hadi Salim wrote:
>>>> fdb_add() *is* flow based. At least in my understanding, the whole
>>>> point here is to extend the idea of fdb_add() and make it understand
>>>> L2-L4 in a more generic way for the most common protocols.
>>>>
>>>> The reason fdb_add() is not reused is because it is Netlink specific
>>>> and only suitable for User -> HW offload. Kernel -> HW offload is
>>>> technically possible but not clean.
>>>>
>>> I dont think we have a problem handling any of this today.
>> Yes we do. It's restricted to L2 and we can't extend it easily
>> because it is based on NDA_*. The use of Netlink makes in-kernel
>> usage a pain. To me this is the sole reason for not using fdb_add()
>> in the first place. It seems absolutely clear though that fdb_add()
>> should be removed after the more generic ndo is in place providing
>> a superset of what fdb_add() can do today.
>>
>>> This is where our (shall i say strong) disagreement is.
>>> I think you will find it non-trivial to show me how you can
>>> actually take the simple L2 bridge and map it to a "flow".
>>> Since your starting point is "everything can be represented via a flow
>>> and some table" - we are at a crosspath.
>> OK, let me do the convertion for you:
>>
>> NDA_DST unused
>> NDA_LLADDR sw_flow_key.eth.dst
>> NDA_CACHEINFO unused
>> NDA_PROBES unused
>> NDA_VLAN sw_flow_key.eth.tci
>> NDA_PORT unused
>> NDA_VNI sw_flow_key.tun_key.tun_id
>> NDA_IFINDEX sw_flow_key.phys.in_port
>> NDA_MASTER unused
>>
>>> The tc filter API seems to be doing just that.
>>> You have different types of classifiers - the h/w may not be able
>>> to support some classifier types - but that is a capability discovery
>>> challenge.
>> Agreed but tc is only one out of many possible existing interfaces
>> we have. macvtap (given we want to extend beyond L2), routing,
>> OVS, bridge and eventually even things like a team device can and
>> should make use of offloads.
>>
>>> I am saying two things:
>>> 1) There are a few "fundamental" interfaces; L2 and L3 being some.
>>> Add crypto offload and a few i mentioned in my presentation. We
>> Can you share that preso? I was not present.
>>
>>> know how to do those. example; there is nothing i cant do with
>>> the rtmsg that is L3. or the fdb/port/vlan filter for L2.
>>> This flow thing should stay out of those.
>> Let me remind you about the name of the structure behind all L3
>> forwarding decisions:
>>
>> struct flowi4 {
>> [...]
>> }
>>
>> Adding a route means adding a flow. Can we please stop the flow
>> bashing? The concept of a flow is very generic, well known and already
>> very present in the kernel.
>>
>> The sw_flow_key proposed comes close to flowi4. Some fields are
>> different. They can eventually get merged. The strict IPv4/IPv6
>> separation is what makes it non obvious and probably why Jiri chose
>> the OVS representation. If you say rtmsg is complete then that clearly
>> is not the case. In particular VTEP fields, ARP, and TCP flags are
>> clearly missing for many uses.
>>
>> Again, I'm not saying flow is the ultimate answer to everything. It
>> is not. But a lot of hardware out there is aware of flows in combination
>> with some form of action execution. Non flow based hardware can have
>> their own classifier.
>>
>>> 2) The flow thing should allow a variety of classifiers to be
>>> handled. Again capability discovery would take care of differences.
>> So you want the flow to represent something that is not a flow. Again,
>> this comes back to the conversation in the other email. If this is
>> all about having a single ndo I'm sure we can find common grounds on
>> that.
>
> From what i understood (trying to summarize here for my own benefit):
> the switchdev api currently under review proposes every switch asic offload
> abstraction as a flow.
> It does not mandate this via code, however, there seems to be some discussion
> along those lines.
>
> The switchdev api flow ndo's need to stay for switch asic drivers that
> support flows directly or
> possibly want all their hw offload abstraction to be represented by the flow
> abstraction (openvswitch, the rocker dev ). The details of how the flow is
> mapped to hw lies in the corresponding switch driver code.
>
> We think rtnetlink is the api to model switch asic hw tables.
> We have a working model (Cumulus) that maps rtnetlink to switch
> asic hw tables (via snooping rtnetlink msgs). This can be done by extending
> the switchdev api
> with new ndo's for l2 and l3.
>
I don’t see it that way. I believe sw_flow can be the intermediary
representation to span flow-based and non-flow-based HW, and from flow-based
world and traditional l2/l3 world.
> Example:
> new switchdev ndo's for fdb_add/fdb_del
> new switchdev ndo's for l3
>
> Now we only need working patches that implement switchdev api ndo ops for
> l2/l3 (this is in the works).
>
> As long as the current patches under review allow the extension of the api to
> cover non-flow based l2/l3 switch asic offloads, we might be good (?).
>
> Thanks,
> Roopa
>
>
>
-scott
_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev