Re: [PATCH net-next] tcp: refine tcp_prune_ofo_queue() to not drop all packets

Soheil Hassas Yeganeh Wed, 17 Aug 2016 18:19:21 -0700

On Wed, Aug 17, 2016 at 5:17 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> From: Eric Dumazet <eduma...@google.com>
>
> Over the years, TCP BDP has increased a lot, and is typically
> in the order of ~10 Mbytes with help of clever Congestion Control
> modules.
>
> In presence of packet losses, TCP stores incoming packets into an out of
> order queue, and number of skbs sitting there waiting for the missing
> packets to be received can match the BDP (~10 Mbytes)
>
> In some cases, TCP needs to make room for incoming skbs, and current
> strategy can simply remove all skbs in the out of order queue as a last
> resort, incurring a huge penalty, both for receiver and sender.
>
> Unfortunately these 'last resort events' are quite frequent, forcing
> sender to send all packets again, stalling the flow and wasting a lot of
> resources.
>
> This patch cleans only a part of the out of order queue in order
> to meet the memory constraints.
>
> Signed-off-by: Eric Dumazet <eduma...@google.com>
> Cc: Neal Cardwell <ncardw...@google.com>
> Cc: Yuchung Cheng <ych...@google.com>
> Cc: Soheil Hassas Yeganeh <soh...@google.com>
> Cc: C. Stephen Gun <c...@google.com>
> Cc: Van Jacobson <v...@google.com>


Acked-by: Soheil Hassas Yeganeh <soh...@google.com>

> ---
>  net/ipv4/tcp_input.c |   47 ++++++++++++++++++++++++-----------------
>  1 file changed, 28 insertions(+), 19 deletions(-)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 
> 3ebf45b38bc309f448dbc4f27fe8722cefabaf19..8cd02c0b056cbc22e2e4a4fe8530b74f7bd25419
>  100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -4392,12 +4392,9 @@ static int tcp_try_rmem_schedule(struct sock *sk, 
> struct sk_buff *skb,
>                 if (tcp_prune_queue(sk) < 0)
>                         return -1;
>
> -               if (!sk_rmem_schedule(sk, skb, size)) {
> +               while (!sk_rmem_schedule(sk, skb, size)) {
>                         if (!tcp_prune_ofo_queue(sk))
>                                 return -1;
> -
> -                       if (!sk_rmem_schedule(sk, skb, size))
> -                               return -1;
>                 }
>         }
>         return 0;
> @@ -4874,29 +4871,41 @@ static void tcp_collapse_ofo_queue(struct sock *sk)
>  }
>
>  /*
> - * Purge the out-of-order queue.
> - * Return true if queue was pruned.
> + * Clean the out-of-order queue to make room.
> + * We drop high sequences packets to :
> + * 1) Let a chance for holes to be filled.
> + * 2) not add too big latencies if thousands of packets sit there.
> + *    (But if application shrinks SO_RCVBUF, we could still end up
> + *     freeing whole queue here)
> + *
> + * Return true if queue has shrunk.
>   */
>  static bool tcp_prune_ofo_queue(struct sock *sk)
>  {
>         struct tcp_sock *tp = tcp_sk(sk);
> -       bool res = false;
> +       struct sk_buff *skb;
>
> -       if (!skb_queue_empty(&tp->out_of_order_queue)) {
> -               NET_INC_STATS(sock_net(sk), LINUX_MIB_OFOPRUNED);
> -               __skb_queue_purge(&tp->out_of_order_queue);
> +       if (skb_queue_empty(&tp->out_of_order_queue))
> +               return false;
>
> -               /* Reset SACK state.  A conforming SACK implementation will
> -                * do the same at a timeout based retransmit.  When a 
> connection
> -                * is in a sad state like this, we care only about integrity
> -                * of the connection not performance.
> -                */
> -               if (tp->rx_opt.sack_ok)
> -                       tcp_sack_reset(&tp->rx_opt);
> +       NET_INC_STATS(sock_net(sk), LINUX_MIB_OFOPRUNED);
> +
> +       while ((skb = __skb_dequeue_tail(&tp->out_of_order_queue)) != NULL) {
> +               tcp_drop(sk, skb);
>                 sk_mem_reclaim(sk);
> -               res = true;
> +               if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf &&
> +                   !tcp_under_memory_pressure(sk))
> +                       break;
>         }
> -       return res;
> +
> +       /* Reset SACK state.  A conforming SACK implementation will
> +        * do the same at a timeout based retransmit.  When a connection
> +        * is in a sad state like this, we care only about integrity
> +        * of the connection not performance.
> +        */
> +       if (tp->rx_opt.sack_ok)
> +               tcp_sack_reset(&tp->rx_opt);
> +       return true;
>  }
>
>  /* Reduce allocated memory if we can, trying to get
>
>

Very nice patch, Eric! Thanks.

Re: [PATCH net-next] tcp: refine tcp_prune_ofo_queue() to not drop all packets

Reply via email to