On Fri, Oct 05, 2018 at 10:41:47AM -0400, Willem de Bruijn wrote: > On Fri, Oct 5, 2018 at 9:53 AM Paolo Abeni <pab...@redhat.com> wrote: > > > > Hi all, > > > > On Fri, 2018-09-14 at 13:59 -0400, Willem de Bruijn wrote: > > > This is a *very rough* draft. Mainly for discussion while we also > > > look at another partially overlapping approach [1]. > > > > I'm wondering how we go on from this ? I'm fine with either approaches. > > Let me send the udp gro static_key patch. Then we don't need > the enable udp on demand logic (patch 2/4). > > Your implementation of GRO is more fleshed out (patch 3/4) than > my quick hack. My only request would be to use a separate > UDP_GRO socket option instead of adding this to the existing > UDP_SEGMENT. > > Sounds good? > > > Also, I'm interested in [try to] enable GRO/GSO batching in the > > forwarding path, as you outlined initially in the GSO series > > submission. That should cover Steffen use-case, too, right? > > Great. Indeed. Though there is some unresolved discussion on > one large gso skb vs frag list. There has been various concerns > around the use of frag lists for GSO in the past, and it does not > match h/w offload. So I think the answer would be the first unless > the second proves considerably faster (in which case it could also > be added later as optimization).
I think it depends a bit on the usecase and hardware etc. if the first or the second approach is faster. So it would be good if we can choose which one to use depending on that. For local socket receiving, building big GSO packets is likely faster than the chaining method. But on forwarding the chaining method might be faster because we don't have the overhead of creating GSO packets and of segmenting them back to their native form (at least as long as we don't have NICs that support hardware UDP GSO). Same applies to packets that undergo IPsec transformation. Another thing where the chaining method could be intersting is when we receive already big LRO or HW GRO packets from the NIC. Packets of the same flow could still travel together through the stack with the chaining method. I've never tried this, though. For now it is just an idea. I have the code for the chaining mehthod here, I'd just need some method to hook it in. Maybe it could be done with some sort of an inet_update_offload() as Paolo already propsed in his pachset, or we could make it configurable per device...