From: Eric Dumazet <eduma...@google.com> Date: Thu, 4 Feb 2021 23:44:17 +0100
> On Thu, Feb 4, 2021 at 11:14 PM Saeed Mahameed <sa...@kernel.org> wrote: > > > > On Thu, 2021-02-04 at 13:31 -0800, Eric Dumazet wrote: > > > From: Eric Dumazet <eduma...@google.com> > > > > > > Commit c80794323e82 ("net: Fix packet reordering caused by GRO and > > > listified RX cooperation") had the unfortunate effect of adding > > > latencies in common workloads. > > > > > > Before the patch, GRO packets were immediately passed to > > > upper stacks. > > > > > > After the patch, we can accumulate quite a lot of GRO > > > packets (depdending on NAPI budget). > > > > > > > Why napi budget ? looking at the code it seems to be more related to > > MAX_GRO_SKBS * gro_normal_batch, since we are counting GRO SKBs as 1 > > > Simply because we call gro_normal_list() from napi_poll(), > > So we flush the napi rx_list every 64 packets under stress.(assuming > NIC driver uses NAPI_POLL_WEIGHT), > or more often if napi_complete_done() is called if the budget was not > exhausted. Saeed, Eric means that if we have e.g. 8 GRO packets with 8 segs each, then rx_list will be flushed only after processing of 64 ingress frames. > GRO always has been able to keep MAX_GRO_SKBS in its layer, but no recent > patch > has changed this part. > > > > > > > > but maybe i am missing some information about the actual issue you are > > hitting. > > > Well, the issue is precisely described in the changelog. > > > > > > > > My fix is counting in napi->rx_count number of segments > > > instead of number of logical packets. > > > > > > Fixes: c80794323e82 ("net: Fix packet reordering caused by GRO and > > > listified RX cooperation") > > > Signed-off-by: Eric Dumazet <eduma...@google.com> > > > Bisected-by: John Sperbeck <jsperb...@google.com> > > > Tested-by: Jian Yang <jiany...@google.com> > > > Cc: Maxim Mikityanskiy <maxi...@mellanox.com> > > > Cc: Alexander Lobakin <aloba...@dlink.ru> It's strange why mailmap didn't pick up my active email at pm.me. Anyways, this fix is correct for me. It restores the original Edward's logics, but without spurious out-of-order deliveries. Moreover, the pre-patch behaviour can easily be achieved by increasing net.core.gro_normal_batch if needed. Thanks! Reviewed-by: Alexander Lobakin <aloba...@pm.me> > > > Cc: Saeed Mahameed <sae...@mellanox.com> > > > Cc: Edward Cree <ec...@solarflare.com> > > > --- > > > net/core/dev.c | 11 ++++++----- > > > 1 file changed, 6 insertions(+), 5 deletions(-) > > > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > > index > > > a979b86dbacda9dfe31dd8b269024f7f0f5a8ef1..449b45b843d40ece7dd1e2ed6a5 > > > 996ee1db9f591 100644 > > > --- a/net/core/dev.c > > > +++ b/net/core/dev.c > > > @@ -5735,10 +5735,11 @@ static void gro_normal_list(struct > > > napi_struct *napi) > > > /* Queue one GRO_NORMAL SKB up for list processing. If batch size > > > exceeded, > > > * pass the whole batch up to the stack. > > > */ > > > -static void gro_normal_one(struct napi_struct *napi, struct sk_buff > > > *skb) > > > +static void gro_normal_one(struct napi_struct *napi, struct sk_buff > > > *skb, int segs) > > > { > > > list_add_tail(&skb->list, &napi->rx_list); > > > - if (++napi->rx_count >= gro_normal_batch) > > > + napi->rx_count += segs; > > > + if (napi->rx_count >= gro_normal_batch) > > > gro_normal_list(napi); > > > } > > > > > > @@ -5777,7 +5778,7 @@ static int napi_gro_complete(struct napi_struct > > > *napi, struct sk_buff *skb) > > > } > > > > > > out: > > > - gro_normal_one(napi, skb); > > > + gro_normal_one(napi, skb, NAPI_GRO_CB(skb)->count); > > > > Seems correct to me, > > > > Reviewed-by: Saeed Mahameed <sae...@nvidia.com> Al