On Wed, Jun 22, 2016 at 9:17 PM, Yuval Mintz <yuval.mi...@qlogic.com> wrote: >> Then again, if you're basically saying every HW-assisted offload on >> receive should be done under LRO flag, what would be the use case >> where a GRO-assisted offload would help? > >> I.e., afaik LRO is superior to GRO in `brute force' - >> it creates better packed packets and utilizes memory better >> [with all the obvious cons such as inability for defragmentation]. >> So if you'd have the choice of having an adpater perform 'classic' >> LRO aggregation or something that resembles a GRO packet, >> what would be the gain from doing the latter?
LRO and GRO shouldn't really differ in packing or anything like that. The big difference between the two is that LRO is destructive while GRO is not. Specifically in the case of GRO you should be able to take the resultant frame, feed it through GSO, and get the original stream of frames back out. So you can pack the frames however you want the only key is that you must capture all the correct offsets and set the gso_size correct for the flow. > Just to relate to bnx2x/qede differences in current implementation - > when this GRO hw-offload was added to bnx2x, it has already > supported classical LRO, and due to above statement whenever LRO > was set driver aggregated incoming traffic as classic LRO. > I agree that in hindsight the lack of distinction between sw/hw GRO > was hurting us. In the case of bnx2x it sounds like you have issues that are significantly hurting the performance versus classic software GRO. If that is the case you might want to simply flip the logic for the module parameter that Rick mentioned and just disable the hardware assisted GRO unless it is specifically requested. > qede isn't implementing LRO, so we could easily mark this feature > under LRO there - but question is, given that the adapter can support > LRO, if we're going to suffer from all the shotrages that arise from > putting this feature under LRO, why should we bother? The idea is to address feature isolation. The fact is the hardware exists outside of kernel control. If you end up linking an internal kernel feature to your device like this you are essentially stripping the option of using the kernel feature. I would prefer to see us extend LRO to support "close enough GRO" instead of have us extend GRO to also include LRO. That way when we encounter issues like the FW limitation that Rick encountered he can just go in and disable the LRO and have true GRO kick in which would be significantly better than having to poke around through documentation to find a module parameter that can force the feature off. Really the fact that you have to use a module parameter is frowned upon as well as most drivers aren't supposed to be using those in the netdev tree. > You can argue that we might need a new feature bit for control > over such a feature; If we don't do that, is there any gain in all of this? I would argue that yes there are many cases where we will be able to show gain. The fact is there is a strong likelihood of the GRO on your parts having some differences either now, or at some point in the future as the code evolves. As I mentioned there was already some talk about possibly needing to push the UDP tunnel aggregation out of GRO and perhaps handling it sometime after IP look up had verified that the destination was in fact a local address in the namespace. In addition it makes the changes to include the tunnel encapsulation much more acceptable as LRO is already naturally dropped in the routing and bridging cases if I recall correctly. - Alex