On Mon, Mar 22, 2021 at 1:00 PM Paolo Abeni <pab...@redhat.com> wrote: > > On Mon, 2021-03-22 at 09:30 -0400, Willem de Bruijn wrote: > > On Sun, Mar 21, 2021 at 1:01 PM Paolo Abeni <pab...@redhat.com> wrote: > > > After the previous patch the stack can do L4 UDP aggregation > > > on top of an UDP tunnel. > > > > > > The current GRO complete code tries frag based aggregation first; > > > in the above scenario will generate corrupted frames. > > > > > > We need to try first UDP tunnel based aggregation, if the GRO > > > packet requires that. We can use time GRO 'encap_mark' field > > > to track the need GRO complete action. If encap_mark is set, > > > skip the frag_list aggregation. > > > > > > On tunnel encap GRO complete clear such field, so that an inner > > > frag_list GRO complete could take action. > > > > > > Signed-off-by: Paolo Abeni <pab...@redhat.com> > > > --- > > > net/ipv4/udp_offload.c | 8 +++++++- > > > net/ipv6/udp_offload.c | 3 ++- > > > 2 files changed, 9 insertions(+), 2 deletions(-) > > > > > > diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c > > > index 25134a3548e99..54e06b88af69a 100644 > > > --- a/net/ipv4/udp_offload.c > > > +++ b/net/ipv4/udp_offload.c > > > @@ -642,6 +642,11 @@ int udp_gro_complete(struct sk_buff *skb, int nhoff, > > > skb_shinfo(skb)->gso_type = uh->check ? > > > SKB_GSO_UDP_TUNNEL_CSUM > > > : SKB_GSO_UDP_TUNNEL; > > > > > > + /* clear the encap mark, so that inner frag_list > > > gro_complete > > > + * can take place > > > + */ > > > + NAPI_GRO_CB(skb)->encap_mark = 0; > > > + > > > /* Set encapsulation before calling into inner > > > gro_complete() > > > * functions to make them set up the inner offsets. > > > */ > > > @@ -665,7 +670,8 @@ INDIRECT_CALLABLE_SCOPE int udp4_gro_complete(struct > > > sk_buff *skb, int nhoff) > > > const struct iphdr *iph = ip_hdr(skb); > > > struct udphdr *uh = (struct udphdr *)(skb->data + nhoff); > > > > > > - if (NAPI_GRO_CB(skb)->is_flist) { > > > + /* do fraglist only if there is no outer UDP encap (or we already > > > processed it) */ > > > + if (NAPI_GRO_CB(skb)->is_flist && !NAPI_GRO_CB(skb)->encap_mark) { > > > > Sorry, I don't follow. I thought the point was to avoid fraglist if an > > outer udp tunnel header is present. But the above code clears the mark > > and allows entering the fraglist branch exactly when such a header is > > encountered? > > The relevant UDP packet has gone through: > > [l2/l3 GRO] -> udp_gro_receive -> udp_sk(sk)->gro_receive -> [some > more GRO layers] -> udp_gro_receive (again) > > The first udp_gro_receive set NAPI_GRO_CB(skb)->encap_mark, the > latter udp_gro_receive set NAPI_GRO_CB(skb)->is_flist. > > Then, at GRO complete time: > > [l2/l3 GRO] -> udp{4,6}_gro_complete -> udp_sk(sk)->gro_complete -> > [more GRO layers] -> udp{4,6}_gro_complete (again). > > In the first udp{4,6}_gro_complete invocation 'encap_mark' is 1, so > with this patch we do the 'udp_sk(sk)->gro_complete' path. In the > second udp{4,6}_gro_complete invocation 'encap_mark' has been cleared > (by udp_gro_complete), so we do the SKB_GSO_FRAGLIST completion. > > In case SKB_GSO_FRAGLIST with no UDP tunnel, 'encap_mark' is 0 and we > do the SKB_GSO_FRAGLIST completion. > > Another alternative, possibly more readable, would be avoid clearing > 'encap_mark' in udp_gro_complete() and replacing the above check with: > > if (NAPI_GRO_CB(skb)->is_flist && > (!NAPI_GRO_CB(skb)->encap_mark || > (NAPI_GRO_CB(skb)->encap_mark && skb->encapsulation))) { > > I opted otherwise to simplify the conditional expression. > > Please let me know if the above is somewhat more clear and if you have > preferecens between the two options.
Got it now, thanks. Yes, this patch makes sense. When the GRO layer has built a GSO packet with both inner UDP (GSO_UDP_L4) and UDP tunnel header (GSO_UDP_TUNNEL), then on completion udp{4,6}_gro_complete will be called twice. This function will enter its is_flist branch immediately, even though that is only correct on the second call, as GSO_FRAGLIST is only relevant for the inner packet. Skip this while encap_mark == 1, identifying processing of the outer tunnel header. Clear the field to enter the path on the next round, for the inner header. This is subject to only supporting a single layer of encap, but that is indeed the current state afaik.