On Thu, Dec 03, 2020 at 23:57, Vladimir Oltean <olte...@gmail.com> wrote: > On Thu, Dec 03, 2020 at 09:53:08PM +0100, Tobias Waldekranz wrote: >> I am not sure which way is the correct one, but I do not think that it >> necessarily _always_ correct to silently fallback to a non-offloaded >> mode. > > Of course, neither is fully correct. There is always more to improve on > the communication side of things. But DSA, which stands for "Not Just A > Switch", promises you, above all, a port multiplexer with the ability to > make full use of the network stack. Maybe I'm still under the influence > of a recent customer meeting I had, but there is a large base of use > cases there, where people just don't seem to have enough ports, and they > can just add a $3 switch to their system, and voila, problem solved. > Everything works absolutely the same as before. The newly added switch > ports are completely integrated with the kernel's IP routing database. > They aren't even switch ports, they're just ports. > > And even there, on the software fallback front, we don't do enough, > currently. I've had other customers who said that they even _prefer_ > to do bridging in software, in order to get a chance to run their > netfilter/ebtables based firewall on their traffic. Of course, DSA > currently has no idea when netfilter rules are installed, so I just told > them to hack the driver and remove the bridging offload, which they > happily did.
Totally agree with your customer, we should be able to disable all offloading for an individual port and run it in a plain "NIC mode". In fact, that might be what opens up a third option for how LAG offloading should be handled. > I suspect you're looking at this from the wrong angle. I did, too, for > the longest time, because I was focused on squeezing the most out of the > hardware I had. And "I feel cheated because the operating system sits > between me and the performance I want from my hardware" is an argument > I've heard all too often. But not everybody needs even gigabits of > traffic, or absurdly low latency. Making a product that works acceptably > and at a low cost is better than hand-picking only hardware that can > accelerate everything and spending $$$$ on it, for a performance > improvement that no user will notice. Doing link aggregation in software > is fine. Doing bridging in software is fine. Sure, the CPU can't compete > past a number of KPPS, but maybe it doesn't even have to. I really get the need for being able to apply some CPU-duct-tape to solve that final corner case in a network. Or to use it as a cheap port expander. > Again, this doesn't mean that we should not report, somehow, what is > offloaded and what isn't, and more importantly, the reason why > offloading the set of required actions is not supported. And I do > realize that I wrote a long rant about something that has nothing to do > with the topic at hand, which is: should we deny bonding interfaces that > use balance-rr, active-backup, broadcast, balance-tlb, balance-alb on > top of a DSA interface? Hell no, you wouldn't expect an intel e1000 card > to error out when you would set up a bonding interface that isn't xor or > 802.3ad, would you? But you wouldn't go ahead and set up bridging > offload either, therefore running into the problem I raised above with > netfilter rules. You would just set out to do what the user asked for, > in the way that you can, and let them decide if the performance is what > they need or not. You make a lot of good points. I think it might be better to force the user to be explicit about their choice though. Imagine something like this: - We add NETIF_F_SWITCHDEV_OFFLOAD, which is set on switchdev ports by default. This flag is only allowed to be toggled when the port has no uppers - we do not want to deal with a port in a LAG in a bridge all of a sudden changing mode. - If it is set, we only allow uppers/tc-rules/etc that we can offload. If the user tries to configure something outside of that, we can suggest disabling offloading in the error we emit. - If it is not set, we just sit back and let the kernel do its thing. This would work well both for exotic LAG modes and for advanced netfilter(ebtables)/tc setups I think. Example session: $ ip link add dev bond0 type bond mode balance-rr $ ip link set dev swp0 master bond0 Error: swp0: balance-rr not supported when using switchdev offloading $ ethtool -K swp0 switchdev off $ ip link set dev swp0 master bond0 $ echo $? 0 >> > Should we add an array of supported TX types that the switch port can >> > offload, and that should be checked by DSA in dsa_lag_offloading? >> >> That would work. We could also create a new DSA op that we would call >> for each chip from PRECHANGEUPPER to verify that it is supported. One >> advantage with this approach is that we can just pass along the `struct >> netdev_lag_upper_info` so drivers always have access to all information, >> in the event that new fields are added to it for example. > > Hmm, not super pleased to have yet another function pointer, but I don't > have any other idea, so I guess that works just fine. Yeah I do not like it either, maybe it is just the least bad thing. > I would even go out on a limb and say hardcode the TX_TYPE_HASH in DSA > for now. I would be completely surprised to see hardware that can > offload anything else in the near future. If you tilt your head a little, I think active backup is really just a trivial case of a hashed LAG wherein only a single member is ever active. I.e. all buckets are always allocated to one port (effectivly negating the hashing). The active member is controlled by software, so I think we should be able to support that. mv88e6xxx could also theoretically be made to support broadcast. You can enable any given bucket on multiple ports, but that seems silly.