> I find using icsk->icsk_ack.rcv_mss misleading.
    > I would either use TCP_MSS_DEFAULT , or maybe simply 0, since we had
    > no samples yet, there is little point to use a magic value.

My point is  by definition  rcv_space is used in TCP's internal auto-tuning to 
grow socket buffers based on how much data the kernel estimates the sender can 
send so we are talking about an estimation anyway that won't be 100% accurate 
especially during the connection initialization, I proposed using TCP_INIT_CWND 
* icsk->icsk_ack.rcv_mss as initial receive space because it's the minimum that 
the sender can send assuming they are filling up their initial congestion 
window this shouldn't cause security impact in my opinion because the rcv_buf 
and receive window won't scale unless the sender sent 5360 in one round trip so 
anything lower than that won't trigger the TCP autotuning.

Another point is that the proposed  patch is impacting the  initial 
rcv_space.space but has no impact on how the receive window/receive buffer  
will scale while the connection is ongoing.

I was actually thinking about the below options before proposing my final patch 
because I am afraid that they will have impact of higher  memory footprint or 
connection get stuck later with packet loss.


I am listing them below as well.


A)The below patch would be enough as the usercopied data would be usually more 
than 50KB so the window will scale 


# diff -u a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c 
--- a/net/ipv4/tcp_input.c      2020-11-18 19:54:23.624306129 +0000
+++ b/net/ipv4/tcp_input.c      2020-11-18 19:55:05.032259419 +0000
@@ -605,7 +605,7 @@
 
        /* Number of bytes copied to user in last RTT */
        copied = tp->copied_seq - tp->rcvq_space.seq;
-       if (copied <= tp->rcvq_space.space)
+       if (copied <= (tp->rcvq_space.space >> 1))
                goto new_measure;


Pros:
-it will decrease the threshold where we are scaling up the receive buffers and 
advertised window to half what it's currently on so we should see the 
connection stuck using the current default kernel configurations with High RTT.
-it's likely hood for getting stuck later is low because it's dynamic and 
should impact the DRS during the whole connection lifetime not just during the 
initialization phase.

Cons:
-This may have Higher memory footprint because we are increasing the receiver 
buffer size on lower threshold than before.


################################################################################################################################

B)Otherwise we can be looking into decreasing the initial rcv_wnd and 
accordingly  rcvq_space.space   by decreasing the default 
ipv4.sysctl_tcp_rmem[1] from 131072 to 87380 as it was before 
https://lore.kernel.org/patchwork/patch/1157936/.

#################################################################################################################################


C)Other solution would be to bound current  rcvq_space.space with the 
usercopied amount of the previous RTT, something as below would also work.


--- a/net/ipv4/tcp_input.c      2020-11-19 15:43:10.441524021 +0000
+++ b/net/ipv4/tcp_input.c      2020-11-19 15:45:42.772614521 +0000
@@ -649,6 +649,7 @@
        tp->rcvq_space.space = copied;
 
 new_measure:
+       tp->rcvq_space.space = copied;
        tp->rcvq_space.seq = tp->copied_seq;
        tp->rcvq_space.time = tp->tcp_mstamp;
 }

Cons:
-When I emulated packet loss later after connection establishment  I could see 
the connection get stuck on low speed for a long time.


On 07/12/2020, 16:09, "Eric Dumazet" <eduma...@google.com> wrote:

    CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



    On Mon, Dec 7, 2020 at 4:37 PM Eric Dumazet <eduma...@google.com> wrote:
    >
    > On Mon, Dec 7, 2020 at 12:41 PM Hazem Mohamed Abuelfotoh
    > <abueh...@amazon.com> wrote:
    > >
    > >     Previously receiver buffer auto-tuning starts after receiving
    > >     one advertised window amount of data.After the initial
    > >     receiver buffer was raised by
    > >     commit a337531b942b ("tcp: up initial rmem to 128KB
    > >     and SYN rwin to around 64KB"),the receiver buffer may
    > >     take too long for TCP autotuning to start raising
    > >     the receiver buffer size.
    > >     commit 041a14d26715 ("tcp: start receiver buffer autotuning sooner")
    > >     tried to decrease the threshold at which TCP auto-tuning starts
    > >     but it's doesn't work well in some environments
    > >     where the receiver has large MTU (9001) especially with high RTT
    > >     connections as in these environments rcvq_space.space will be the 
same
    > >     as rcv_wnd so TCP autotuning will never start because
    > >     sender can't send more than rcv_wnd size in one round trip.
    > >     To address this issue this patch is decreasing the initial
    > >     rcvq_space.space so TCP autotuning kicks in whenever the sender is
    > >     able to send more than 5360 bytes in one round trip regardless the
    > >     receiver's configured MTU.
    > >
    > >     Fixes: a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to 
around 64KB")
    > >     Fixes: 041a14d26715 ("tcp: start receiver buffer autotuning sooner")
    > >
    > > Signed-off-by: Hazem Mohamed Abuelfotoh <abueh...@amazon.com>
    > > ---
    > >  net/ipv4/tcp_input.c | 3 ++-
    > >  1 file changed, 2 insertions(+), 1 deletion(-)
    > >
    > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
    > > index 389d1b340248..f0ffac9e937b 100644
    > > --- a/net/ipv4/tcp_input.c
    > > +++ b/net/ipv4/tcp_input.c
    > > @@ -504,13 +504,14 @@ static void tcp_grow_window(struct sock *sk, 
const struct sk_buff *skb)
    > >  static void tcp_init_buffer_space(struct sock *sk)
    > >  {
    > >         int tcp_app_win = sock_net(sk)->ipv4.sysctl_tcp_app_win;
    > > +       struct inet_connection_sock *icsk = inet_csk(sk);
    > >         struct tcp_sock *tp = tcp_sk(sk);
    > >         int maxwin;
    > >
    > >         if (!(sk->sk_userlocks & SOCK_SNDBUF_LOCK))
    > >                 tcp_sndbuf_expand(sk);
    > >
    > > -       tp->rcvq_space.space = min_t(u32, tp->rcv_wnd, TCP_INIT_CWND * 
tp->advmss);
    > > +       tp->rcvq_space.space = min_t(u32, tp->rcv_wnd, TCP_INIT_CWND * 
icsk->icsk_ack.rcv_mss);
    >
    

    0 will not work, since we use a do_div(grow, tp->rcvq_space.space)

    >
    > Note that if a driver uses 16KB of memory to hold a 1500 bytes packet,
    > then a 10 MSS GRO packet is consuming 160 KB of memory,
    > which is bigger than tcp_rmem[1]. TCP could decide to drop these fat 
packets.
    >
    > I wonder if your patch does not work around a more fundamental issue,
    > I am still unable to reproduce the issue.




Amazon Web Services EMEA SARL, 38 avenue John F. Kennedy, L-1855 Luxembourg, 
R.C.S. Luxembourg B186284

Amazon Web Services EMEA SARL, Irish Branch, One Burlington Plaza, Burlington 
Road, Dublin 4, Ireland, branch registration number 908705


Reply via email to