Re: [net-next] tcp: add TCP_INFO status for failed client TFO

2019-10-22 Thread Yuchung Cheng
On Tue, Oct 22, 2019 at 12:34 PM Jason Baron wrote: > > > > On 10/22/19 2:17 PM, Yuchung Cheng wrote: > > On Mon, Oct 21, 2019 at 7:14 PM Neal Cardwell wrote: > >> > >> On Mon, Oct 21, 2019 at 5:11 PM Jason Baron wrote: > >>> > >>> &g

Re: [net-next] tcp: add TCP_INFO status for failed client TFO

2019-10-22 Thread Yuchung Cheng
On Mon, Oct 21, 2019 at 7:14 PM Neal Cardwell wrote: > > On Mon, Oct 21, 2019 at 5:11 PM Jason Baron wrote: > > > > > > > > On 10/21/19 4:36 PM, Eric Dumazet wrote: > > > On Mon, Oct 21, 2019 at 12:53 PM Christoph Paasch > > > wrote: > > >> > > > > > >> Actually, longterm I hope we would be abl

Re: [net-next] tcp: add TCP_INFO status for failed client TFO

2019-10-21 Thread Yuchung Cheng
Thanks for the patch. Detailed comments below On Fri, Oct 18, 2019 at 4:58 PM Neal Cardwell wrote: > > On Fri, Oct 18, 2019 at 3:03 PM Jason Baron wrote: > > > > The TCPI_OPT_SYN_DATA bit as part of tcpi_options currently reports whether > > or not data-in-SYN was ack'd on both the client and se

Re: [PATCH net] tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state

2019-09-27 Thread Yuchung Cheng
On Thu, Sep 26, 2019 at 3:42 PM Eric Dumazet wrote: > > Yuchung Cheng and Marek Majkowski independently reported a weird > behavior of TCP_USER_TIMEOUT option when used at connect() time. > > When the TCP_USER_TIMEOUT is reached, tcp_write_timeout() > believes the flow sh

Re: [PATCH v5 2/2] tcp: Add snd_wnd to TCP_INFO

2019-09-13 Thread Yuchung Cheng
rpose of adding an additional __u32 to avoid the > would-be hole caused by the addition of the tcpi_rcvi_ooopack field. > > Signed-off-by: Thomas Higdon > --- Acked-by: Yuchung Cheng > changes since v4: > - clarify comment > include/uapi/linux/tcp.h | 4 > net/ipv4

Re: [PATCH v4 2/2] tcp: Add snd_wnd to TCP_INFO

2019-09-13 Thread Yuchung Cheng
On Fri, Sep 13, 2019 at 2:53 PM Neal Cardwell wrote: > > On Fri, Sep 13, 2019 at 5:29 PM Yuchung Cheng wrote: > > > What if the comment is shortened up to fit in 80 columns and the units > > > (bytes) are added, something like: > > > > > >

Re: [PATCH v4 2/2] tcp: Add snd_wnd to TCP_INFO

2019-09-13 Thread Yuchung Cheng
On Fri, Sep 13, 2019 at 2:02 PM Neal Cardwell wrote: > > On Fri, Sep 13, 2019 at 3:36 PM Thomas Higdon wrote: > > > > Neal Cardwell mentioned that snd_wnd would be useful for diagnosing TCP > > performance problems -- > > > (1) Usually when we're diagnosing TCP performance problems, we do so > >

Re: [PATCH bpf-next 0/8] bpf: TCP RTT sock_ops bpf callback

2019-07-01 Thread Yuchung Cheng
ted-by: Eric Dumazet > Cc: Eric Dumazet > Cc: Priyaranjan Jha > Cc: Yuchung Cheng Acked-by: Yuchung Cheng Thanks! > Cc: Soheil Hassas Yeganeh > > Stanislav Fomichev (8): > bpf: add BPF_CGROUP_SOCK_OPS callback that is executed on every RTT > bpf: split shared bpf_tcp_sock

Re: [PATCH net] inet: clear num_timeout reqsk_alloc()

2019-06-19 Thread Yuchung Cheng
net_release+0x1f7/0x270 net/ipv4/af_inet.c:427 > > inet6_release+0xaf/0x100 net/ipv6/af_inet6.c:470 > > __sock_release net/socket.c:601 [inline] > > sock_close+0x156/0x490 net/socket.c:1273 > > __fput+0x4c9/0xba0 fs/file_table.c:280 > > fput+0x37/0x40 fs

[PATCH net] tcp: fix undo spurious SYNACK in passive Fast Open

2019-06-07 Thread Yuchung Cheng
transmit") Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell --- net/ipv4/tcp_input.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 08a477e74cf3..38dfc308c0fb 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4

Re: [PATCH net-next 0/6] add TFO backup key

2019-05-28 Thread Yuchung Cheng
On Tue, May 28, 2019 at 7:37 AM Jason Baron wrote: > > On 5/24/19 7:17 PM, Yuchung Cheng wrote: > > On Thu, May 23, 2019 at 4:31 PM Yuchung Cheng wrote: > >> > >> On Thu, May 23, 2019 at 12:14 PM David Miller wrote: > >>> > >>> From: Ja

Re: [PATCH net-next 0/6] add TFO backup key

2019-05-24 Thread Yuchung Cheng
On Thu, May 23, 2019 at 4:31 PM Yuchung Cheng wrote: > > On Thu, May 23, 2019 at 12:14 PM David Miller wrote: > > > > From: Jason Baron > > Date: Wed, 22 May 2019 16:39:32 -0400 > > > > > Christoph, Igor, and I have worked on an API that facilitates TFO

Re: [PATCH net-next 0/6] add TFO backup key

2019-05-23 Thread Yuchung Cheng
On Thu, May 23, 2019 at 12:14 PM David Miller wrote: > > From: Jason Baron > Date: Wed, 22 May 2019 16:39:32 -0400 > > > Christoph, Igor, and I have worked on an API that facilitates TFO key > > rotation. This is a follow up to the series that Christoph previously > > posted, with an API that mee

Re: [PATCH net] tcp: fix retrans timestamp on passive Fast Open

2019-05-13 Thread Yuchung Cheng
From: David Miller Date: Fri, May 10, 2019 at 4:41 PM To: Cc: , > From: Yuchung Cheng > Date: Fri, 10 May 2019 16:00:19 -0700 > > > Fixes: 3844718c20d0 ("tcp: properly track retry time on passive Fast Open") > > This is not a valid commit ID. Oops submitting a v2. sorry for the typo

[PATCH v2 net] tcp: fix retrans timestamp on passive Fast Open

2019-05-13 Thread Yuchung Cheng
Any successful loss recovery would reset the timestamp to avoid this issue. Fixes: c7d13c8faa74 ("tcp: properly track retry time on passive Fast Open") Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell --- net/ipv4/tcp_input.c | 3 +++ 1 file changed, 3 insertions(+) diff --gi

[PATCH net] tcp: fix retrans timestamp on passive Fast Open

2019-05-10 Thread Yuchung Cheng
Any successful loss recovery would reset the timestamp to avoid this issue. Fixes: 3844718c20d0 ("tcp: properly track retry time on passive Fast Open") Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell --- net/ipv4/tcp_input.c | 3 +++ 1 file changed, 3 insertions(+) diff --gi

[PATCH net-next 4/8] tcp: undo init congestion window on false SYNACK timeout

2019-04-29 Thread Yuchung Cheng
will still use the default initial congestion (e.g. 10) because tp->undo_marker is reset in tcp_init_metrics(). This is an intentional design because packets are not lost but delayed. This patch only covers regular TCP passive open. Fast Open is supported in the next patch. Signed-off-by: Yuchu

[PATCH net-next 6/8] tcp: undo cwnd on Fast Open spurious SYNACK retransmit

2019-04-29 Thread Yuchung Cheng
code since no data is acknowledged. The fix is to check such case explicitly after tcp_ack() during the ACK processing in SYN_RECV state. In addition this is checked in FIN_WAIT_1 state in case the server closes the socket before handshake completes. Signed-off-by: Yuchung Cheng Signed-off-by: Neal

[PATCH net-next 7/8] tcp: refactor to consolidate TFO passive open code

2019-04-29 Thread Yuchung Cheng
Use a helper to consolidate two identical code block for passive TFO. Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell Signed-off-by: Soheil Hassas Yeganeh Signed-off-by: Eric Dumazet --- net/ipv4/tcp_input.c | 52 +--- 1 file changed, 25

[PATCH net-next 2/8] tcp: undo initial congestion window on false SYN timeout

2019-04-29 Thread Yuchung Cheng
we have to implement a different undo code additionally. The detection also must happen before tcp_ack() as retrans_stamp is reset when SYN is acknowledged. Note this patch covers both active regular and fast open. Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell Signed-off-by: Eric Duma

[PATCH net-next 5/8] tcp: lower congestion window on Fast Open SYNACK timeout

2019-04-29 Thread Yuchung Cheng
RFC6298. Note that tcp_enter_loss() is called only once during recurring timeouts. This is because during handshake, high_seq and snd_una are the same so tcp_enter_loss() would incorrect set the undo state variables multiple times. Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell Signed

[PATCH net-next 8/8] tcp: refactor setting the initial congestion window

2019-04-29 Thread Yuchung Cheng
Relocate the congestion window initialization from tcp_init_metrics() to tcp_init_transfer() to improve code readability. Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell Signed-off-by: Soheil Hassas Yeganeh Signed-off-by: Eric Dumazet --- net/ipv4/tcp.c | 12

[PATCH net-next 0/8] undo congestion window on spurious SYN or SYNACK timeout

2019-04-29 Thread Yuchung Cheng
for both active and passive as well as Fast Open or regular connections. Yuchung Cheng (8): tcp: avoid unconditional congestion window undo on SYN retransmit tcp: undo initial congestion window on false SYN timeout tcp: better SYNACK sent timestamp tcp: undo init congestion window on

[PATCH net-next 1/8] tcp: avoid unconditional congestion window undo on SYN retransmit

2019-04-29 Thread Yuchung Cheng
ave an incorrect ack sequence number since rcv_nxt has not been updated yet tcp_rcv_synsent_state_process(), the retransmission needs to properly handed by tcp_rcv_fastopen_synack() like before. Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell Signed-off-by: Eric Dumazet --- net/ipv4/tcp_i

[PATCH net-next 3/8] tcp: better SYNACK sent timestamp

2019-04-29 Thread Yuchung Cheng
Detecting spurious SYNACK timeout using timestamp option requires recording the exact SYNACK skb timestamp. Previously the SYNACK sent timestamp was stamped slightly earlier before the skb was transmitted. This patch uses the SYNACK skb transmission timestamp directly. Signed-off-by: Yuchung

Re: [PATCH v2 bpf-next 5/7] bpf: sysctl for probe_on_drop

2019-04-08 Thread Yuchung Cheng
On Mon, Apr 8, 2019 at 10:07 AM Eric Dumazet wrote: > > > > On 04/08/2019 09:16 AM, Neal Cardwell wrote: > > On Wed, Apr 3, 2019 at 8:13 PM brakmo wrote: > >> > >> When a packet is dropped when calling queue_xmit in __tcp_transmit_skb > >> and packets_out is 0, it is beneficial to set a small pr

Re: [PATCH net-next 00/11] tcp: remove code from tcp_create_openreq_child()

2019-01-17 Thread Yuchung Cheng
pt & syn_data_acked init to tcp_disconnect() > > net/ipv4/tcp.c | 21 - > net/ipv4/tcp_minisocks.c | 34 -- > 2 files changed, 20 insertions(+), 35 deletions(-) > > -- Entire patch set looks great to me! Acked-by: Yuchung Cheng > 2.20.1.321.g9e740568ce-goog >

[PATCH net-next 2/8] tcp: always timestamp on every skb transmission

2019-01-16 Thread Yuchung Cheng
e old time-stamping style before commit 8c72c65b426b ("tcp: update skb->skb_mstamp more carefully") which addresses a problem in computing the elapsed time of a stalled window-probing socket. The problem will be addressed differently in the next patches with a simpler approach. Signe

[PATCH net-next 0/8] improving TCP behavior on host congestion

2019-01-16 Thread Yuchung Cheng
. Then retry more conservatively (twice a second) on local qdisc congestion but abort the sockets according to the system limit. Yuchung Cheng (8): tcp: exit if nothing to retransmit on RTO timeout tcp: always timestamp on every skb transmission tcp: always set retrans_stamp on recovery tcp

[PATCH net-next 4/8] tcp: properly track retry time on passive Fast Open

2019-01-16 Thread Yuchung Cheng
(tp->retrans_stamp is 0), and the socket may abort immediately on the very first FIN timeout, instead of retying until it passes the system or user specified limit. Signed-off-by: Yuchung Cheng Signed-off-by: Eric Dumazet Reviewed-by: Neal Cardwell Reviewed-by: Soheil Hassas Yeganeh --- net/i

[PATCH net-next 1/8] tcp: exit if nothing to retransmit on RTO timeout

2019-01-16 Thread Yuchung Cheng
Previously TCP only warns if its RTO timer fires and the retransmission queue is empty, but it'll cause null pointer reference later on. It's better to avoid such catastrophic failure and simply exit with a warning. Signed-off-by: Yuchung Cheng Signed-off-by: Eric Dumazet Reviewe

[PATCH net-next 7/8] tcp: retry more conservatively on local congestion

2019-01-16 Thread Yuchung Cheng
retry more conservatively (500ms) and update the stats properly to reflect these incidents and follow the system limit. Note that this is consistent with the behavior when a keep-alive probe is dropped due to local congestion. Signed-off-by: Yuchung Cheng Signed-off-by: Eric Dumazet Reviewed-by: Nea

[PATCH net-next 5/8] tcp: create a helper to model exponential backoff

2019-01-16 Thread Yuchung Cheng
Create a helper to model TCP exponential backoff for the next patch. This is pure refactor w no behavior change. Signed-off-by: Yuchung Cheng Signed-off-by: Eric Dumazet Reviewed-by: Neal Cardwell Reviewed-by: Soheil Hassas Yeganeh --- net/ipv4/tcp_timer.c | 31

[PATCH net-next 8/8] tcp: less aggressive window probing on local congestion

2019-01-16 Thread Yuchung Cheng
retry more conservatively (500ms) and update the stats properly to reflect these incidents and follow the system limit. Note that this is consistent with the behaviors when a keep-alive probe or RTO retry is dropped due to local congestion. Signed-off-by: Yuchung Cheng Signed-off-by: Eric Dumazet Reviewe

[PATCH net-next 6/8] tcp: simplify window probe aborting on USER_TIMEOUT

2019-01-16 Thread Yuchung Cheng
d the exponential backoff behavior. Signed-off-by: Yuchung Cheng Signed-off-by: Eric Dumazet Reviewed-by: Neal Cardwell Reviewed-by: Soheil Hassas Yeganeh --- net/ipv4/tcp_timer.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/net/ipv4/tcp_timer.c b/net

[PATCH net-next 3/8] tcp: always set retrans_stamp on recovery

2019-01-16 Thread Yuchung Cheng
n when the original packet was sent. Signed-off-by: Yuchung Cheng Signed-off-by: Eric Dumazet Reviewed-by: Neal Cardwell Reviewed-by: Soheil Hassas Yeganeh --- net/ipv4/tcp_output.c | 9 - net/ipv4/tcp_timer.c | 23 +++ 2 files changed, 7 insertions(+), 25 deletion

[PATCH net] tcp: change txhash on SYN-data timeout

2019-01-08 Thread Yuchung Cheng
retransmission uses a new flow label. This patch removes this undesirable behavior so Fast Open changes the flow label just like the regular connections. This also helps avoid falsely disabling Fast Open on the sender which triggers after two consecutive SYN timeouts on Fast Open. Signed-off-by: Yuchung

[PATCH bpf] bpf: correctly set initial window on active Fast Open sender

2019-01-08 Thread Yuchung Cheng
-data additionally. Fixes: fc7478103c84 ("bpf: Adds support for setting initial cwnd") Signed-off-by: Yuchung Cheng Reviewed-by: Neal Cardwell --- net/core/filter.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/filter.c b/net/core/filter.c index 44

Re: [PATCH net-next 3/5] tcp: Print list of TFO-keys from proc

2018-12-17 Thread Yuchung Cheng
On Mon, Dec 17, 2018 at 3:35 PM Christoph Paasch wrote: > > On 17/12/18 - 08:52:22, Yuchung Cheng wrote: > > On Sun, Dec 16, 2018 at 10:32 PM Eric Dumazet > > wrote: > > > > > > > > > > > > On 12/14/2018 02:40 PM, Christoph Paasch wrot

Re: [PATCH net-next 3/5] tcp: Print list of TFO-keys from proc

2018-12-17 Thread Yuchung Cheng
On Sun, Dec 16, 2018 at 10:32 PM Eric Dumazet wrote: > > > > On 12/14/2018 02:40 PM, Christoph Paasch wrote: > > Print the list of the TFO-keys with a comma separated. For setting the > > keys, we still only allow a single one to be set. > > > > I wonder if some applications expecting current form

[PATCH net] tcp: fix NULL ref in tail loss probe

2018-12-05 Thread Yuchung Cheng
tch the root cause of the inflight accounting inconsistency. Reported-by: Rafael Tinoco Signed-off-by: Yuchung Cheng Signed-off-by: Eric Dumazet Signed-off-by: Neal Cardwell --- net/ipv4/tcp_output.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/net/i

Re: [PATCH net] tcp: Do not underestimate rwnd_limited

2018-12-05 Thread Yuchung Cheng
his really means we are rwnd limited. > > > > Fixes: 5615f88614a4 ("tcp: instrument how long TCP is limited by receive > > window") > > Signed-off-by: Eric Dumazet > > Acked-by: Soheil Hassas Yeganeh Reviewed-by: Yuchung Cheng > > Excellent catch! Thank

Re: [PATCH] net: tcp: add correct check for tcp_retransmit_skb()

2018-11-30 Thread Yuchung Cheng
On Fri, Nov 30, 2018 at 10:28 AM Sharath Chandra Vurukala wrote: > > when the tcp_retranmission_timer expires and tcp_retranmsit_skb is > called if the retranmsission fails due to local congestion, > backoff should not incremented. > > tcp_retransmit_skb() returns non-zero negative value in some c

[PATCH net 3/3] tcp: fix SNMP TCP timeout under-estimation

2018-11-28 Thread Yuchung Cheng
. For example the monitoring system needs to collect many other SNMP counters to infer the total amount of timeout events. This patch makes TCPTIMEOUTS counter simply counts all the retransmit timeout (SYN or data or FIN). Signed-off-by: Yuchung Cheng Signed-off-by: Eric Dumazet Signed-off-by

[PATCH net 1/3] tcp: fix off-by-one bug on aborting window-probing socket

2018-11-28 Thread Yuchung Cheng
Previously there is an off-by-one bug on determining when to abort a stalled window-probing socket. This patch fixes that so it is consistent with tcp_write_timeout(). Signed-off-by: Yuchung Cheng Signed-off-by: Eric Dumazet Signed-off-by: Neal Cardwell --- net/ipv4/tcp_timer.c | 2 +- 1 file

[PATCH net 2/3] tcp: fix SNMP under-estimation on failed retransmission

2018-11-28 Thread Yuchung Cheng
Previously the SNMP counter LINUX_MIB_TCPRETRANSFAIL is not counting the TSO/GSO properly on failed retransmission. This patch fixes that. Signed-off-by: Yuchung Cheng Signed-off-by: Eric Dumazet Signed-off-by: Neal Cardwell --- net/ipv4/tcp_output.c | 2 +- 1 file changed, 1 insertion(+), 1

[PATCH net 0/3] fixes in timeout and retransmission accounting

2018-11-28 Thread Yuchung Cheng
This patch set has assorted fixes of minor accounting issues in timeout, window probe, and retransmission stats. Yuchung Cheng (3): tcp: fix off-by-one bug on aborting window-probing socket tcp: fix SNMP under-estimation on failed retransmission tcp: fix SNMP TCP timeout under-estimation

Re: [PATCH] net: tcp: add correct check for tcp_retransmit_skb()

2018-11-27 Thread Yuchung Cheng
On Mon, Nov 26, 2018 at 1:35 AM, Sharath Chandra Vurukala wrote: > when the tcp_retranmission_timer expires and tcp_retranmsit_skb is > called if the retranmsission fails due to local congestion, > backoff should not incremented. > > tcp_retransmit_skb() returns non-zero negative value in some cas

Re: [PATCH v2 net-next 0/4] tcp: take a bit more care of backlog stress

2018-11-27 Thread Yuchung Cheng
tcp: make tcp_space() aware of socket backlog Great feature! Acked-by: Yuchung Cheng > > > > Eric Dumazet (4): > tcp: hint compiler about sack flows > tcp: take care of compressed acks in tcp_add_reno_sack() > tcp: make tcp_space() aware of socket backlog > tcp: imp

Re: [PATCH iproute2] ss: add support for delivered and delivered_ce fields

2018-11-26 Thread Yuchung Cheng
lastack:2 pacing_rate 431.9Mbps delivery_rate 246.4Mbps > (*) delivered:1469099 delivered_ce:424799 > busy:99231ms unacked:44 rcv_space:14280 rcv_ssthresh:65535 > notsent:2207688 minrtt:0.228 > > Signed-off-by: Eric Dumazet Acked-by: Yuchung Cheng Thank you Eric! > --- > mis

Re: [PATCH net-next 2/3] tcp: implement coalescing on backlog queue

2018-11-22 Thread Yuchung Cheng
On Wed, Nov 21, 2018 at 2:40 PM, Eric Dumazet wrote: > > > On 11/21/2018 02:31 PM, Yuchung Cheng wrote: >> On Wed, Nov 21, 2018 at 9:52 AM, Eric Dumazet wrote: > >>> + >> Really nice! would it make sense to re-use (some of) the similar >> tcp_try_coalesce()

Re: [PATCH net-next 3/3] tcp: implement head drops in backlog queue

2018-11-21 Thread Yuchung Cheng
On Wed, Nov 21, 2018 at 4:18 PM, Eric Dumazet wrote: > On Wed, Nov 21, 2018 at 3:52 PM Eric Dumazet wrote: >> This is basically what the patch does, the while loop breaks when we have >> freed >> just enough skbs. > > Also this is the patch we tested with Jean-Louis on his host, bring > very nic

Re: [PATCH net-next 3/3] tcp: implement head drops in backlog queue

2018-11-21 Thread Yuchung Cheng
On Wed, Nov 21, 2018 at 2:47 PM, Eric Dumazet wrote: > > > On 11/21/2018 02:40 PM, Yuchung Cheng wrote: >> On Wed, Nov 21, 2018 at 9:52 AM, Eric Dumazet wrote: >>> Under high stress, and if GRO or coalescing does not help, >>> we better make room in backlo

Re: [PATCH net-next 1/3] tcp: remove hdrlen argument from tcp_queue_rcv()

2018-11-21 Thread Yuchung Cheng
On Wed, Nov 21, 2018 at 9:52 AM, Eric Dumazet wrote: > Only one caller needs to pull TCP headers, so lets > move __skb_pull() to the caller side. > > Signed-off-by: Eric Dumazet > --- Acked-by: Yuchung Cheng > net/ipv4/tcp_input.c | 13 ++--- > 1 file change

Re: [PATCH net-next 3/3] tcp: implement head drops in backlog queue

2018-11-21 Thread Yuchung Cheng
order. I like the benefit of fast recovery but I am a bit leery about head drop causing HoLB on large read, while tail drops can be repaired by RACK and TLP already. Hmm - > > Signed-off-by: Eric Dumazet > Tested-by: Jean-Louis Dupond > Cc: Neal Cardwell > Cc: Yuchung Cheng

Re: [PATCH net-next 2/3] tcp: implement coalescing on backlog queue

2018-11-21 Thread Yuchung Cheng
e from user thread and softirq, > to give more chances to __release_sock() to complete its work. > > This also helps if we receive many ACK packets, since GRO > does not aggregate them. > > Signed-off-by: Eric Dumazet > Tested-by: Jean-Louis Dupond > Cc: Neal Cardwell

Re: [PATCH net-next 3/3] tcp: get rid of tcp_tso_should_defer() dependency on HZ/jiffies

2018-11-12 Thread Yuchung Cheng
medium rate flows, >> especially when receivers do not use GRO or similar aggregation. >> >> It also reduces bursts for HZ=100 or HZ=250 kernels, making TCP >> behavior more uniform. >> >> Signed-off-by: Eric Dumazet >> Acked-by: Soheil Hassas Yeganeh >> --- > > Nice. Thanks! > > Acked-by: Neal Cardwell Acked-by: Yuchung Cheng Love it > > neal

[PATCH net-next] tcp: refactor DCTCP ECN ACK handling

2018-10-08 Thread Yuchung Cheng
Vegas algorithmas). For example, BBR is experimenting such ECN signal currently https://tinyurl.com/ietf-102-iccrg-bbr2 Signed-off-by: Yuchung Cheng Signed-off-by: Yousuk Seung Signed-off-by: Neal Cardwell Signed-off-by: Eric Dumazet --- net/ipv4/tcp_dctcp.c | 55

Re: WARN_ON in TLP causing RT throttling

2018-10-02 Thread Yuchung Cheng
On Thu, Sep 27, 2018 at 5:16 PM, wrote: > > On 2018-09-27 13:14, Yuchung Cheng wrote: >> >> On Wed, Sep 26, 2018 at 5:09 PM, Eric Dumazet wrote: >>> >>> >>> >>> >>> On 09/26/2018 04:46 PM, stran...@codeaurora.org wrote: >>&g

Re: [PATCH net-next] tcp: start receiver buffer autotuning sooner

2018-10-01 Thread Yuchung Cheng
On Mon, Oct 1, 2018 at 3:46 PM, David Miller wrote: > From: Yuchung Cheng > Date: Mon, 1 Oct 2018 15:42:32 -0700 > >> Previously receiver buffer auto-tuning starts after receiving >> one advertised window amount of data. After the initial receiver >> buffer was r

[PATCH net-next] tcp: start receiver buffer autotuning sooner

2018-10-01 Thread Yuchung Cheng
To address this issue, this patch lowers the initial bytes expected to receive roughly the expected sender's initial window. Fixes: a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") Signed-off-by: Yuchung Cheng Signed-off-by: Wei Wang Signed-off-by: Neal Card

Re: [PATCH net-next v2] tcp: up initial rmem to 128KB and SYN rwin to around 64KB

2018-10-01 Thread Yuchung Cheng
On Sat, Sep 29, 2018 at 11:23 AM, David Miller wrote: > > From: Yuchung Cheng > Date: Fri, 28 Sep 2018 13:09:02 -0700 > > > Previously TCP initial receive buffer is ~87KB by default and > > the initial receive window is ~29KB (20 MSS). This patch changes > > the

[PATCH net-next v2] tcp: up initial rmem to 128KB and SYN rwin to around 64KB

2018-09-28 Thread Yuchung Cheng
nds to increase the buffer to the appropriate level (2x sender congestion window). With this patch TCP memory configuration is more straight-forward and more properly sized to modern high-speed networks by default. Several popular stacks have been announcing 64KB rwin in SYNs as well. Signed-off-by: Y

Re: [PATCH net-next] tcp: up initial rmem to 128KB and SYN rwin to around 64KB

2018-09-27 Thread Yuchung Cheng
On Thu, Sep 27, 2018 at 11:21 AM, Yuchung Cheng wrote: > Previously TCP initial receive buffer is ~87KB by default and > the initial receive window is ~29KB (20 MSS). This patch changes > the two numbers to 128KB and ~64KB (rounding down to the multiples > of MSS) respectively. Th

Re: WARN_ON in TLP causing RT throttling

2018-09-27 Thread Yuchung Cheng
On Wed, Sep 26, 2018 at 5:09 PM, Eric Dumazet wrote: > > > > On 09/26/2018 04:46 PM, stran...@codeaurora.org wrote: > > Hi Eric, > > > > Someone recently reported a crash to us on the 4.14.62 kernel where > > excessive > > WARNING prints were spamming the logs and causing watchdog bites. The kern

[PATCH net-next] tcp: up initial rmem to 128KB and SYN rwin to around 64KB

2018-09-27 Thread Yuchung Cheng
e been announcing 64KB rwin in SYNs as well. Signed-off-by: Yuchung Cheng Signed-off-by: Wei Wang Signed-off-by: Neal Cardwell Signed-off-by: Eric Dumazet Reviewed-by: Soheil Hassas Yeganeh --- net/ipv4/tcp.c| 4 ++-- net/ipv4/tcp_input.c | 25 ++--- net/ipv4/tcp_ou

[PATCH net-next] tcp: change IPv6 flow-label upon receiving spurious retransmission

2018-08-29 Thread Yuchung Cheng
ering on the second consecutive spurious RTO, the receiver changes the flow label upon sending a second consecutive DSACK for a sequence number below RCV.NXT. Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell Signed-off-by: Eric Dumazet --- net/ipv4/tcp.c | 2 ++ net/ipv4/tcp_in

Re: Fw: [Bug 200943] New: Repeating tcp_mark_head_lost in dmesg

2018-08-29 Thread Yuchung Cheng
The good news the particular loss recovery code path is disabled by default on 4.18+ kernels by this patch commit b38a51fec1c1f693f03b1aa19d0622123634d4b7 Author: Yuchung Cheng Date: Wed May 16 16:40:11 2018 -0700 tcp: disable RFC6675 loss detection > > [Mon Aug 27 02:16:11 20

[PATCH net-next 3/4] tcp: always ACK immediately on hole repairs

2018-08-09 Thread Yuchung Cheng
. Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell Signed-off-by: Wei Wang Signed-off-by: Eric Dumazet --- net/ipv4/tcp_input.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index b8849588c440..9a09ff3afef2 100644

[PATCH net-next 4/4] tcp: avoid resetting ACK timer upon receiving packet with ECN CWR flag

2018-08-09 Thread Yuchung Cheng
+0 > [ect01] . 4:4(0) ack 5501 +.31 < [ect0] . 5501:6501(1000) ack 4 win 257 +0 > [ect01] . 4:4(0) ack 6501 Fixes: 9aee40006190 ("tcp: ack immediately when a cwr packet arrives") Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell --- net/ipv4/tcp_input.c | 8 -

[PATCH net-next 2/4] tcp: avoid resetting ACK timer in DCTCP

2018-08-09 Thread Yuchung Cheng
flag instead of calling tcp_enter_quickack_mode(). Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell Signed-off-by: Wei Wang Signed-off-by: Eric Dumazet --- net/ipv4/tcp_dctcp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4

[PATCH net-next 0/4] new mechanism to ACK immediately

2018-08-09 Thread Yuchung Cheng
protocol states: 1) When a hole is repaired 2) When CE status changes between subsequent data packets received 3) When a data packet carries CWR flag Yuchung Cheng (4): tcp: mandate a one-time immediate ACK tcp: avoid resetting ACK timer in DCTCP tcp: always ACK immediately on hole repairs tcp

[PATCH net-next 1/4] tcp: mandate a one-time immediate ACK

2018-08-09 Thread Yuchung Cheng
tcp_enter_quickack_mode() because we do not want to forget the icsk_ack.pingpong or icsk_ack.ato state. Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell Signed-off-by: Wei Wang Signed-off-by: Eric Dumazet --- include/net/inet_connection_sock.h | 3 ++- net/ipv4/tcp_input.c | 4

Re: [PATCH net-next] tcp: ack immediately when a cwr packet arrives

2018-07-24 Thread Yuchung Cheng
gt; >> Signed-off-by: Lawrence Brakmo > >> --- > >> net/ipv4/tcp_input.c | 9 - > >> 1 file changed, 8 insertions(+), 1 deletion(-) > > > > Seems like a nice mechanism to have, IMHO. > > > > Acked-by: Neal Cardwell > > Should t

[PATCH net 2/3] tcp: do not cancel delay-AcK on DCTCP special ACK

2018-07-18 Thread Yuchung Cheng
on in tcp_send_ack and check the actual ack sequence before cancelling the delayed ACK. Further it's safer to pass the ack sequence number as a local variable into tcp_send_ack routine, instead of intercepting tp->rcv_nxt to avoid future bugs like this. Reported-by: Neal Cardwell Signed-off-by:

[PATCH net 3/3] tcp: do not delay ACK in DCTCP upon CE status change

2018-07-18 Thread Yuchung Cheng
, ..., 1) = 1 0.200 > [ect01] P. 1:2(1) ack 1001 0.200 < [ect0] . 1001:2001(1000) ack 2 win 257 +0.005 < [ce] . 2001:3001(1000) ack 2 win 257 +0.000 > [ect01] . 2:2(0) ack 2001 // Previously the ACK below would be delayed by 40ms +0.000 > [ect01] E. 2:2(0) ack 3001 +0.500 < F.

[PATCH net 1/3] tcp: helpers to send special DCTCP ack

2018-07-18 Thread Yuchung Cheng
Refactor and create helpers to send the special ACK in DCTCP. Signed-off-by: Yuchung Cheng Acked-by: Neal Cardwell --- net/ipv4/tcp_output.c | 22 +- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index

[PATCH net 0/3] fix DCTCP ECE Ack series

2018-07-18 Thread Yuchung Cheng
This patch set address that the existing DCTCP implementation does not fully implement the ACK policy specified in the RFC. This improves the responsiveness of CE status change particularly on flows with small inflight. Yuchung Cheng (3): tcp: helpers to send special ack tcp: do not cancel

[PATCH net 2/2] tcp: remove DELAYED ACK events in DCTCP

2018-07-12 Thread Yuchung Cheng
After fixing the way DCTCP tracking delayed ACKs, the delayed-ACK related callbacks are no longer needed Signed-off-by: Yuchung Cheng Signed-off-by: Eric Dumazet Acked-by: Neal Cardwell --- include/net/tcp.h | 2 -- net/ipv4/tcp_dctcp.c | 25 - net/ipv4

[PATCH net 1/2] tcp: fix dctcp delayed ACK schedule

2018-07-12 Thread Yuchung Cheng
rything +0.500 < F. 9501:9501(0) ack 4 win 257 Reported-by: Larry Brakmo Signed-off-by: Yuchung Cheng Signed-off-by: Eric Dumazet Acked-by: Neal Cardwell --- net/ipv4/tcp_dctcp.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_dctcp.c b/net/i

[PATCH net 0/2] fix DCTCP delayed ACK

2018-07-12 Thread Yuchung Cheng
This patch series addresses the issue that sometimes DCTCP fail to acknowledge the latest sequence and result in sender timeout if inflight is small. Yuchung Cheng (2): tcp: fix dctcp delayed ACK schedule tcp: remove DELAYED ACK events in DCTCP include/net/tcp.h | 2 -- net/ipv4

Re: [PATCH net-next v3 0/2] tcp: fix high tail latencies in DCTCP

2018-07-09 Thread Yuchung Cheng
On Sat, Jul 7, 2018 at 7:07 AM, Neal Cardwell wrote: > On Sat, Jul 7, 2018 at 7:15 AM David Miller wrote: >> >> From: Lawrence Brakmo >> Date: Tue, 3 Jul 2018 09:26:13 -0700 >> >> > When have observed high tail latencies when using DCTCP for RPCs as >> > compared to using Cubic. For example, in

Re: [PATCH net-next] tcp: expose both send and receive intervals for rate sample

2018-07-09 Thread Yuchung Cheng
e due to ACK compression or decimation. Algorithms > may want to use send rates and receive rates as separate signals. > > Signed-off-by: Deepti Raghavan Acked-by: Yuchung Cheng > --- > include/net/tcp.h | 2 ++ > net/ipv4/tcp_rate.c | 4 > 2 files changed, 6 insertion

Re: [PATCH net-next v2 1/2] tcp: notify when a delayed ack is sent

2018-07-02 Thread Yuchung Cheng
On Mon, Jul 2, 2018 at 2:39 PM, Lawrence Brakmo wrote: > > DCTCP depends on the CA_EVENT_NON_DELAYED_ACK and CA_EVENT_DELAYED_ACK > notifications to keep track if it needs to send an ACK for packets that > were received with a particular ECN state but whose ACK was delayed. > > Under some circumst

[PATCH net] tcp: fix Fast Open key endianness

2018-06-27 Thread Yuchung Cheng
-by: Daniele Iamartino Signed-off-by: Yuchung Cheng Signed-off-by: Eric Dumazet Signed-off-by: Neal Cardwell --- net/ipv4/sysctl_net_ipv4.c | 18 +- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index

Re: [PATCH net-next v2] tcp: force cwnd at least 2 in tcp_cwnd_reduction

2018-06-27 Thread Yuchung Cheng
On Wed, Jun 27, 2018 at 1:00 PM, Lawrence Brakmo wrote: > > > From: on behalf of Yuchung Cheng > > Date: Wednesday, June 27, 2018 at 9:59 AM > To: Neal Cardwell > Cc: Lawrence Brakmo , Matt Mathis , > Netdev , Kernel Team , Blake > Matheny , Alexei Starovoitov

Re: [PATCH net-next v2] tcp: force cwnd at least 2 in tcp_cwnd_reduction

2018-06-27 Thread Yuchung Cheng
On Wed, Jun 27, 2018 at 8:24 AM, Neal Cardwell wrote: > On Tue, Jun 26, 2018 at 10:34 PM Lawrence Brakmo wrote: >> The only issue is if it is safe to always use 2 or if it is better to >> use min(2, snd_ssthresh) (which could still trigger the problem). > > Always using 2 SGTM. I don't think we n

Re: [PATCH net-next] tcp: remove one indentation level in tcp_create_openreq_child

2018-06-26 Thread Yuchung Cheng
On Tue, Jun 26, 2018 at 8:45 AM, Eric Dumazet wrote: > Signed-off-by: Eric Dumazet > --- nice refactor! Acked-by: Yuchung Cheng > net/ipv4/tcp_minisocks.c | 223 --- > 1 file changed, 113 insertions(+), 110 deletions(-) > > dif

Re: [PATCH net-next 2/2] tcp: do not aggressively quick ack after ECN events

2018-05-22 Thread Yuchung Cheng
nough. > > This should reduce the extra load noticed in DCTCP environments, > after congestion events. > > This is part 2 of our effort to reduce pure ACK packets. > > Signed-off-by: Eric Dumazet > --- Acked-by: Yuchung Cheng Thanks for this patch. I am still wondering h

Re: [PATCH v3 net-next 3/6] tcp: add SACK compression

2018-05-17 Thread Yuchung Cheng
44 >> values that this commit hard-coded. > >> Signed-off-by: Eric Dumazet >> --- > > Very nice. I like the constants and the min(rcv_rtt, srtt). > > Acked-by: Neal Cardwell Acked-by: Yuchung Cheng Great work. Hopefully this would save middle-boxes' from handling TCP-ACK themselves. > > Thanks! > > neal

Re: [PATCH net-next 3/4] tcp: add SACK compression

2018-05-17 Thread Yuchung Cheng
On Thu, May 17, 2018 at 9:59 AM, Yuchung Cheng wrote: > On Thu, May 17, 2018 at 9:41 AM, Neal Cardwell wrote: >> >> On Thu, May 17, 2018 at 11:40 AM Eric Dumazet >> wrote: >> > On 05/17/2018 08:14 AM, Neal Cardwell wrote: >> > > Is there a particular mo

Re: [PATCH net-next 3/4] tcp: add SACK compression

2018-05-17 Thread Yuchung Cheng
On Thu, May 17, 2018 at 9:41 AM, Neal Cardwell wrote: > > On Thu, May 17, 2018 at 11:40 AM Eric Dumazet > wrote: > > On 05/17/2018 08:14 AM, Neal Cardwell wrote: > > > Is there a particular motivation for the cap of 127? IMHO 127 ACKs is > quite > > > a few to compress. Experience seems to show t

[PATCH net-next 5/8] tcp: new helper tcp_timeout_mark_lost

2018-05-16 Thread Yuchung Cheng
Refactor using a new helper, tcp_timeout_mark_loss(), that marks packets lost upon RTO. Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell Reviewed-by: Eric Dumazet Reviewed-by: Soheil Hassas Yeganeh Reviewed-by: Priyaranjan Jha --- net/ipv4/tcp_input.c | 50

[PATCH net-next 1/8] tcp: support DUPACK threshold in RACK

2018-05-16 Thread Yuchung Cheng
3*MSS). Also the minimum reordering window is reduced from 1 msec to 0 to recover quicker on short RTT transfers. Therefore RACK is more aggressive in marking packets lost during recovery to reduce the reordering window timeouts. Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell Review

[PATCH net-next 4/8] tcp: account lost retransmit after timeout

2018-05-16 Thread Yuchung Cheng
simplifies the RTO code by sharing much of the logic with Fast Recovery. Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell Reviewed-by: Eric Dumazet Reviewed-by: Soheil Hassas Yeganeh Reviewed-by: Priyaranjan Jha --- include/net/tcp.h | 1 + net/ipv4/tcp_input.c| 18

[PATCH net-next 8/8] tcp: don't mark recently sent packets lost on RTO

2018-05-16 Thread Yuchung Cheng
start from one packet (with Cubic congestion control). This commit was tested in an A/B test with Google web servers, and showed a reduction of 2% in (spurious) retransmits post timeout (SlowStartRetrans), and correspondingly reduced DSACKs (DSACKIgnoredOld) by 7%. Signed-off-by: Yuchung Cheng

[PATCH net-next 7/8] tcp: new helper tcp_rack_skb_timeout

2018-05-16 Thread Yuchung Cheng
Create and export a new helper tcp_rack_skb_timeout and move tcp_is_rack to prepare the final RTO change. Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell Reviewed-by: Eric Dumazet Reviewed-by: Soheil Hassas Yeganeh Reviewed-by: Priyaranjan Jha --- include/net/tcp.h | 2

[PATCH net-next 6/8] tcp: separate loss marking and state update on RTO

2018-05-16 Thread Yuchung Cheng
Fast Recovery s.t. the inflight is updated first before tcp_enter_recovery flips state to CA_Recovery. 2) avoid intertwining loss marking with state update, making the code more readable. Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell Reviewed-by: Eric Dumazet Reviewed-by: Soheil

[PATCH net-next 2/8] tcp: disable RFC6675 loss detection

2018-05-16 Thread Yuchung Cheng
This patch disables RFC6675 loss detection and make sysctl net.ipv4.tcp_recovery = 1 controls a binary choice between RACK (1) or RFC6675 (0). Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell Reviewed-by: Eric Dumazet Reviewed-by: Soheil Hassas Yeganeh Reviewed-by: Priyaranjan Jha

[PATCH net-next 3/8] tcp: simpler NewReno implementation

2018-05-16 Thread Yuchung Cheng
Linux. It should not to be confused with the Reno (AIMD) congestion control. Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell Reviewed-by: Eric Dumazet Reviewed-by: Soheil Hassas Yeganeh Reviewed-by: Priyaranjan Jha --- include/net/tcp.h | 1 + net/ipv4/tcp_input.c| 19

  1   2   3   4   >