On Tue, Oct 22, 2019 at 12:34 PM Jason Baron wrote:
>
>
>
> On 10/22/19 2:17 PM, Yuchung Cheng wrote:
> > On Mon, Oct 21, 2019 at 7:14 PM Neal Cardwell wrote:
> >>
> >> On Mon, Oct 21, 2019 at 5:11 PM Jason Baron wrote:
> >>>
> >>>
&g
On Mon, Oct 21, 2019 at 7:14 PM Neal Cardwell wrote:
>
> On Mon, Oct 21, 2019 at 5:11 PM Jason Baron wrote:
> >
> >
> >
> > On 10/21/19 4:36 PM, Eric Dumazet wrote:
> > > On Mon, Oct 21, 2019 at 12:53 PM Christoph Paasch
> > > wrote:
> > >>
> > >
> > >> Actually, longterm I hope we would be abl
Thanks for the patch. Detailed comments below
On Fri, Oct 18, 2019 at 4:58 PM Neal Cardwell wrote:
>
> On Fri, Oct 18, 2019 at 3:03 PM Jason Baron wrote:
> >
> > The TCPI_OPT_SYN_DATA bit as part of tcpi_options currently reports whether
> > or not data-in-SYN was ack'd on both the client and se
On Thu, Sep 26, 2019 at 3:42 PM Eric Dumazet wrote:
>
> Yuchung Cheng and Marek Majkowski independently reported a weird
> behavior of TCP_USER_TIMEOUT option when used at connect() time.
>
> When the TCP_USER_TIMEOUT is reached, tcp_write_timeout()
> believes the flow sh
rpose of adding an additional __u32 to avoid the
> would-be hole caused by the addition of the tcpi_rcvi_ooopack field.
>
> Signed-off-by: Thomas Higdon
> ---
Acked-by: Yuchung Cheng
> changes since v4:
> - clarify comment
> include/uapi/linux/tcp.h | 4
> net/ipv4
On Fri, Sep 13, 2019 at 2:53 PM Neal Cardwell wrote:
>
> On Fri, Sep 13, 2019 at 5:29 PM Yuchung Cheng wrote:
> > > What if the comment is shortened up to fit in 80 columns and the units
> > > (bytes) are added, something like:
> > >
> > >
On Fri, Sep 13, 2019 at 2:02 PM Neal Cardwell wrote:
>
> On Fri, Sep 13, 2019 at 3:36 PM Thomas Higdon wrote:
> >
> > Neal Cardwell mentioned that snd_wnd would be useful for diagnosing TCP
> > performance problems --
> > > (1) Usually when we're diagnosing TCP performance problems, we do so
> >
ted-by: Eric Dumazet
> Cc: Eric Dumazet
> Cc: Priyaranjan Jha
> Cc: Yuchung Cheng
Acked-by: Yuchung Cheng
Thanks!
> Cc: Soheil Hassas Yeganeh
>
> Stanislav Fomichev (8):
> bpf: add BPF_CGROUP_SOCK_OPS callback that is executed on every RTT
> bpf: split shared bpf_tcp_sock
net_release+0x1f7/0x270 net/ipv4/af_inet.c:427
> > inet6_release+0xaf/0x100 net/ipv6/af_inet6.c:470
> > __sock_release net/socket.c:601 [inline]
> > sock_close+0x156/0x490 net/socket.c:1273
> > __fput+0x4c9/0xba0 fs/file_table.c:280
> > fput+0x37/0x40 fs
transmit")
Signed-off-by: Yuchung Cheng
Signed-off-by: Neal Cardwell
---
net/ipv4/tcp_input.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 08a477e74cf3..38dfc308c0fb 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4
On Tue, May 28, 2019 at 7:37 AM Jason Baron wrote:
>
> On 5/24/19 7:17 PM, Yuchung Cheng wrote:
> > On Thu, May 23, 2019 at 4:31 PM Yuchung Cheng wrote:
> >>
> >> On Thu, May 23, 2019 at 12:14 PM David Miller wrote:
> >>>
> >>> From: Ja
On Thu, May 23, 2019 at 4:31 PM Yuchung Cheng wrote:
>
> On Thu, May 23, 2019 at 12:14 PM David Miller wrote:
> >
> > From: Jason Baron
> > Date: Wed, 22 May 2019 16:39:32 -0400
> >
> > > Christoph, Igor, and I have worked on an API that facilitates TFO
On Thu, May 23, 2019 at 12:14 PM David Miller wrote:
>
> From: Jason Baron
> Date: Wed, 22 May 2019 16:39:32 -0400
>
> > Christoph, Igor, and I have worked on an API that facilitates TFO key
> > rotation. This is a follow up to the series that Christoph previously
> > posted, with an API that mee
From: David Miller
Date: Fri, May 10, 2019 at 4:41 PM
To:
Cc: ,
> From: Yuchung Cheng
> Date: Fri, 10 May 2019 16:00:19 -0700
>
> > Fixes: 3844718c20d0 ("tcp: properly track retry time on passive Fast Open")
>
> This is not a valid commit ID.
Oops submitting a v2. sorry for the typo
Any
successful loss recovery would reset the timestamp to avoid this
issue.
Fixes: c7d13c8faa74 ("tcp: properly track retry time on passive Fast Open")
Signed-off-by: Yuchung Cheng
Signed-off-by: Neal Cardwell
---
net/ipv4/tcp_input.c | 3 +++
1 file changed, 3 insertions(+)
diff --gi
Any
successful loss recovery would reset the timestamp to avoid this
issue.
Fixes: 3844718c20d0 ("tcp: properly track retry time on passive Fast Open")
Signed-off-by: Yuchung Cheng
Signed-off-by: Neal Cardwell
---
net/ipv4/tcp_input.c | 3 +++
1 file changed, 3 insertions(+)
diff --gi
will still use the default
initial congestion (e.g. 10) because tp->undo_marker is reset in
tcp_init_metrics(). This is an intentional design because packets
are not lost but delayed.
This patch only covers regular TCP passive open. Fast Open is
supported in the next patch.
Signed-off-by: Yuchu
code since no data is acknowledged. The fix is to
check such case explicitly after tcp_ack() during the ACK processing
in SYN_RECV state. In addition this is checked in FIN_WAIT_1 state
in case the server closes the socket before handshake completes.
Signed-off-by: Yuchung Cheng
Signed-off-by: Neal
Use a helper to consolidate two identical code block for passive TFO.
Signed-off-by: Yuchung Cheng
Signed-off-by: Neal Cardwell
Signed-off-by: Soheil Hassas Yeganeh
Signed-off-by: Eric Dumazet
---
net/ipv4/tcp_input.c | 52 +---
1 file changed, 25
we
have to implement a different undo code additionally. The detection
also must happen before tcp_ack() as retrans_stamp is reset when
SYN is acknowledged.
Note this patch covers both active regular and fast open.
Signed-off-by: Yuchung Cheng
Signed-off-by: Neal Cardwell
Signed-off-by: Eric Duma
RFC6298.
Note that tcp_enter_loss() is called only once during recurring
timeouts. This is because during handshake, high_seq and snd_una
are the same so tcp_enter_loss() would incorrect set the undo state
variables multiple times.
Signed-off-by: Yuchung Cheng
Signed-off-by: Neal Cardwell
Signed
Relocate the congestion window initialization from tcp_init_metrics()
to tcp_init_transfer() to improve code readability.
Signed-off-by: Yuchung Cheng
Signed-off-by: Neal Cardwell
Signed-off-by: Soheil Hassas Yeganeh
Signed-off-by: Eric Dumazet
---
net/ipv4/tcp.c | 12
for both active and
passive as well as Fast Open or regular connections.
Yuchung Cheng (8):
tcp: avoid unconditional congestion window undo on SYN retransmit
tcp: undo initial congestion window on false SYN timeout
tcp: better SYNACK sent timestamp
tcp: undo init congestion window on
ave an incorrect ack sequence number since
rcv_nxt has not been updated yet tcp_rcv_synsent_state_process(), the
retransmission needs to properly handed by tcp_rcv_fastopen_synack()
like before.
Signed-off-by: Yuchung Cheng
Signed-off-by: Neal Cardwell
Signed-off-by: Eric Dumazet
---
net/ipv4/tcp_i
Detecting spurious SYNACK timeout using timestamp option requires
recording the exact SYNACK skb timestamp. Previously the SYNACK
sent timestamp was stamped slightly earlier before the skb
was transmitted. This patch uses the SYNACK skb transmission
timestamp directly.
Signed-off-by: Yuchung
On Mon, Apr 8, 2019 at 10:07 AM Eric Dumazet wrote:
>
>
>
> On 04/08/2019 09:16 AM, Neal Cardwell wrote:
> > On Wed, Apr 3, 2019 at 8:13 PM brakmo wrote:
> >>
> >> When a packet is dropped when calling queue_xmit in __tcp_transmit_skb
> >> and packets_out is 0, it is beneficial to set a small pr
pt & syn_data_acked init to tcp_disconnect()
>
> net/ipv4/tcp.c | 21 -
> net/ipv4/tcp_minisocks.c | 34 --
> 2 files changed, 20 insertions(+), 35 deletions(-)
>
> --
Entire patch set looks great to me!
Acked-by: Yuchung Cheng
> 2.20.1.321.g9e740568ce-goog
>
e old time-stamping
style before commit 8c72c65b426b ("tcp: update skb->skb_mstamp more
carefully") which addresses a problem in computing the elapsed time
of a stalled window-probing socket. The problem will be addressed
differently in the next patches with a simpler approach.
Signe
. Then retry more conservatively (twice a second) on local
qdisc congestion but abort the sockets according to the system limit.
Yuchung Cheng (8):
tcp: exit if nothing to retransmit on RTO timeout
tcp: always timestamp on every skb transmission
tcp: always set retrans_stamp on recovery
tcp
(tp->retrans_stamp is
0), and the socket may abort immediately on the very first FIN
timeout, instead of retying until it passes the system or user
specified limit.
Signed-off-by: Yuchung Cheng
Signed-off-by: Eric Dumazet
Reviewed-by: Neal Cardwell
Reviewed-by: Soheil Hassas Yeganeh
---
net/i
Previously TCP only warns if its RTO timer fires and the
retransmission queue is empty, but it'll cause null pointer
reference later on. It's better to avoid such catastrophic failure
and simply exit with a warning.
Signed-off-by: Yuchung Cheng
Signed-off-by: Eric Dumazet
Reviewe
retry more conservatively (500ms)
and update the stats properly to reflect these incidents and follow
the system limit. Note that this is consistent with the behavior
when a keep-alive probe is dropped due to local congestion.
Signed-off-by: Yuchung Cheng
Signed-off-by: Eric Dumazet
Reviewed-by: Nea
Create a helper to model TCP exponential backoff for the next patch.
This is pure refactor w no behavior change.
Signed-off-by: Yuchung Cheng
Signed-off-by: Eric Dumazet
Reviewed-by: Neal Cardwell
Reviewed-by: Soheil Hassas Yeganeh
---
net/ipv4/tcp_timer.c | 31
retry more conservatively
(500ms) and update the stats properly to reflect these incidents
and follow the system limit. Note that this is consistent with
the behaviors when a keep-alive probe or RTO retry is dropped
due to local congestion.
Signed-off-by: Yuchung Cheng
Signed-off-by: Eric Dumazet
Reviewe
d the exponential backoff behavior.
Signed-off-by: Yuchung Cheng
Signed-off-by: Eric Dumazet
Reviewed-by: Neal Cardwell
Reviewed-by: Soheil Hassas Yeganeh
---
net/ipv4/tcp_timer.c | 14 +++---
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/net/ipv4/tcp_timer.c b/net
n when the
original packet was sent.
Signed-off-by: Yuchung Cheng
Signed-off-by: Eric Dumazet
Reviewed-by: Neal Cardwell
Reviewed-by: Soheil Hassas Yeganeh
---
net/ipv4/tcp_output.c | 9 -
net/ipv4/tcp_timer.c | 23 +++
2 files changed, 7 insertions(+), 25 deletion
retransmission uses a new flow label.
This patch removes this undesirable behavior so Fast Open changes
the flow label just like the regular connections. This also helps
avoid falsely disabling Fast Open on the sender which triggers
after two consecutive SYN timeouts on Fast Open.
Signed-off-by: Yuchung
-data additionally.
Fixes: fc7478103c84 ("bpf: Adds support for setting initial cwnd")
Signed-off-by: Yuchung Cheng
Reviewed-by: Neal Cardwell
---
net/core/filter.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index 44
On Mon, Dec 17, 2018 at 3:35 PM Christoph Paasch wrote:
>
> On 17/12/18 - 08:52:22, Yuchung Cheng wrote:
> > On Sun, Dec 16, 2018 at 10:32 PM Eric Dumazet
> > wrote:
> > >
> > >
> > >
> > > On 12/14/2018 02:40 PM, Christoph Paasch wrot
On Sun, Dec 16, 2018 at 10:32 PM Eric Dumazet wrote:
>
>
>
> On 12/14/2018 02:40 PM, Christoph Paasch wrote:
> > Print the list of the TFO-keys with a comma separated. For setting the
> > keys, we still only allow a single one to be set.
> >
>
> I wonder if some applications expecting current form
tch the root cause of the inflight
accounting inconsistency.
Reported-by: Rafael Tinoco
Signed-off-by: Yuchung Cheng
Signed-off-by: Eric Dumazet
Signed-off-by: Neal Cardwell
---
net/ipv4/tcp_output.c | 11 +++
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/net/i
his really means we are rwnd limited.
> >
> > Fixes: 5615f88614a4 ("tcp: instrument how long TCP is limited by receive
> > window")
> > Signed-off-by: Eric Dumazet
>
> Acked-by: Soheil Hassas Yeganeh
Reviewed-by: Yuchung Cheng
>
> Excellent catch! Thank
On Fri, Nov 30, 2018 at 10:28 AM Sharath Chandra Vurukala
wrote:
>
> when the tcp_retranmission_timer expires and tcp_retranmsit_skb is
> called if the retranmsission fails due to local congestion,
> backoff should not incremented.
>
> tcp_retransmit_skb() returns non-zero negative value in some c
. For
example the monitoring system needs to collect many other SNMP counters
to infer the total amount of timeout events. This patch makes TCPTIMEOUTS
counter simply counts all the retransmit timeout (SYN or data or FIN).
Signed-off-by: Yuchung Cheng
Signed-off-by: Eric Dumazet
Signed-off-by
Previously there is an off-by-one bug on determining when to abort
a stalled window-probing socket. This patch fixes that so it is
consistent with tcp_write_timeout().
Signed-off-by: Yuchung Cheng
Signed-off-by: Eric Dumazet
Signed-off-by: Neal Cardwell
---
net/ipv4/tcp_timer.c | 2 +-
1 file
Previously the SNMP counter LINUX_MIB_TCPRETRANSFAIL is not counting
the TSO/GSO properly on failed retransmission. This patch fixes that.
Signed-off-by: Yuchung Cheng
Signed-off-by: Eric Dumazet
Signed-off-by: Neal Cardwell
---
net/ipv4/tcp_output.c | 2 +-
1 file changed, 1 insertion(+), 1
This patch set has assorted fixes of minor accounting issues in
timeout, window probe, and retransmission stats.
Yuchung Cheng (3):
tcp: fix off-by-one bug on aborting window-probing socket
tcp: fix SNMP under-estimation on failed retransmission
tcp: fix SNMP TCP timeout under-estimation
On Mon, Nov 26, 2018 at 1:35 AM, Sharath Chandra Vurukala
wrote:
> when the tcp_retranmission_timer expires and tcp_retranmsit_skb is
> called if the retranmsission fails due to local congestion,
> backoff should not incremented.
>
> tcp_retransmit_skb() returns non-zero negative value in some cas
tcp: make tcp_space() aware of socket backlog
Great feature!
Acked-by: Yuchung Cheng
>
>
>
> Eric Dumazet (4):
> tcp: hint compiler about sack flows
> tcp: take care of compressed acks in tcp_add_reno_sack()
> tcp: make tcp_space() aware of socket backlog
> tcp: imp
lastack:2 pacing_rate 431.9Mbps delivery_rate 246.4Mbps
> (*) delivered:1469099 delivered_ce:424799
> busy:99231ms unacked:44 rcv_space:14280 rcv_ssthresh:65535
> notsent:2207688 minrtt:0.228
>
> Signed-off-by: Eric Dumazet
Acked-by: Yuchung Cheng
Thank you Eric!
> ---
> mis
On Wed, Nov 21, 2018 at 2:40 PM, Eric Dumazet wrote:
>
>
> On 11/21/2018 02:31 PM, Yuchung Cheng wrote:
>> On Wed, Nov 21, 2018 at 9:52 AM, Eric Dumazet wrote:
>
>>> +
>> Really nice! would it make sense to re-use (some of) the similar
>> tcp_try_coalesce()
On Wed, Nov 21, 2018 at 4:18 PM, Eric Dumazet wrote:
> On Wed, Nov 21, 2018 at 3:52 PM Eric Dumazet wrote:
>> This is basically what the patch does, the while loop breaks when we have
>> freed
>> just enough skbs.
>
> Also this is the patch we tested with Jean-Louis on his host, bring
> very nic
On Wed, Nov 21, 2018 at 2:47 PM, Eric Dumazet wrote:
>
>
> On 11/21/2018 02:40 PM, Yuchung Cheng wrote:
>> On Wed, Nov 21, 2018 at 9:52 AM, Eric Dumazet wrote:
>>> Under high stress, and if GRO or coalescing does not help,
>>> we better make room in backlo
On Wed, Nov 21, 2018 at 9:52 AM, Eric Dumazet wrote:
> Only one caller needs to pull TCP headers, so lets
> move __skb_pull() to the caller side.
>
> Signed-off-by: Eric Dumazet
> ---
Acked-by: Yuchung Cheng
> net/ipv4/tcp_input.c | 13 ++---
> 1 file change
order.
I like the benefit of fast recovery but I am a bit leery about head
drop causing HoLB on large read, while tail drops can be repaired by
RACK and TLP already. Hmm -
>
> Signed-off-by: Eric Dumazet
> Tested-by: Jean-Louis Dupond
> Cc: Neal Cardwell
> Cc: Yuchung Cheng
e from user thread and softirq,
> to give more chances to __release_sock() to complete its work.
>
> This also helps if we receive many ACK packets, since GRO
> does not aggregate them.
>
> Signed-off-by: Eric Dumazet
> Tested-by: Jean-Louis Dupond
> Cc: Neal Cardwell
medium rate flows,
>> especially when receivers do not use GRO or similar aggregation.
>>
>> It also reduces bursts for HZ=100 or HZ=250 kernels, making TCP
>> behavior more uniform.
>>
>> Signed-off-by: Eric Dumazet
>> Acked-by: Soheil Hassas Yeganeh
>> ---
>
> Nice. Thanks!
>
> Acked-by: Neal Cardwell
Acked-by: Yuchung Cheng
Love it
>
> neal
Vegas algorithmas). For example, BBR is
experimenting such ECN signal currently
https://tinyurl.com/ietf-102-iccrg-bbr2
Signed-off-by: Yuchung Cheng
Signed-off-by: Yousuk Seung
Signed-off-by: Neal Cardwell
Signed-off-by: Eric Dumazet
---
net/ipv4/tcp_dctcp.c | 55
On Thu, Sep 27, 2018 at 5:16 PM, wrote:
>
> On 2018-09-27 13:14, Yuchung Cheng wrote:
>>
>> On Wed, Sep 26, 2018 at 5:09 PM, Eric Dumazet wrote:
>>>
>>>
>>>
>>>
>>> On 09/26/2018 04:46 PM, stran...@codeaurora.org wrote:
>>&g
On Mon, Oct 1, 2018 at 3:46 PM, David Miller wrote:
> From: Yuchung Cheng
> Date: Mon, 1 Oct 2018 15:42:32 -0700
>
>> Previously receiver buffer auto-tuning starts after receiving
>> one advertised window amount of data. After the initial receiver
>> buffer was r
To address this issue, this patch lowers
the initial bytes expected to receive roughly the expected sender's
initial window.
Fixes: a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around
64KB")
Signed-off-by: Yuchung Cheng
Signed-off-by: Wei Wang
Signed-off-by: Neal Card
On Sat, Sep 29, 2018 at 11:23 AM, David Miller wrote:
>
> From: Yuchung Cheng
> Date: Fri, 28 Sep 2018 13:09:02 -0700
>
> > Previously TCP initial receive buffer is ~87KB by default and
> > the initial receive window is ~29KB (20 MSS). This patch changes
> > the
nds to increase the buffer to the
appropriate level (2x sender congestion window).
With this patch TCP memory configuration is more straight-forward and
more properly sized to modern high-speed networks by default. Several
popular stacks have been announcing 64KB rwin in SYNs as well.
Signed-off-by: Y
On Thu, Sep 27, 2018 at 11:21 AM, Yuchung Cheng wrote:
> Previously TCP initial receive buffer is ~87KB by default and
> the initial receive window is ~29KB (20 MSS). This patch changes
> the two numbers to 128KB and ~64KB (rounding down to the multiples
> of MSS) respectively. Th
On Wed, Sep 26, 2018 at 5:09 PM, Eric Dumazet wrote:
>
>
>
> On 09/26/2018 04:46 PM, stran...@codeaurora.org wrote:
> > Hi Eric,
> >
> > Someone recently reported a crash to us on the 4.14.62 kernel where
> > excessive
> > WARNING prints were spamming the logs and causing watchdog bites. The kern
e been announcing 64KB rwin in SYNs as well.
Signed-off-by: Yuchung Cheng
Signed-off-by: Wei Wang
Signed-off-by: Neal Cardwell
Signed-off-by: Eric Dumazet
Reviewed-by: Soheil Hassas Yeganeh
---
net/ipv4/tcp.c| 4 ++--
net/ipv4/tcp_input.c | 25 ++---
net/ipv4/tcp_ou
ering on the second consecutive spurious
RTO, the receiver changes the flow label upon sending a second
consecutive DSACK for a sequence number below RCV.NXT.
Signed-off-by: Yuchung Cheng
Signed-off-by: Neal Cardwell
Signed-off-by: Eric Dumazet
---
net/ipv4/tcp.c | 2 ++
net/ipv4/tcp_in
The good news the particular loss recovery code path is disabled by
default on 4.18+ kernels by this patch
commit b38a51fec1c1f693f03b1aa19d0622123634d4b7
Author: Yuchung Cheng
Date: Wed May 16 16:40:11 2018 -0700
tcp: disable RFC6675 loss detection
>
> [Mon Aug 27 02:16:11 20
.
Signed-off-by: Yuchung Cheng
Signed-off-by: Neal Cardwell
Signed-off-by: Wei Wang
Signed-off-by: Eric Dumazet
---
net/ipv4/tcp_input.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b8849588c440..9a09ff3afef2 100644
+0 > [ect01] . 4:4(0) ack 5501
+.31 < [ect0] . 5501:6501(1000) ack 4 win 257
+0 > [ect01] . 4:4(0) ack 6501
Fixes: 9aee40006190 ("tcp: ack immediately when a cwr packet arrives")
Signed-off-by: Yuchung Cheng
Signed-off-by: Neal Cardwell
---
net/ipv4/tcp_input.c | 8 -
flag instead of calling
tcp_enter_quickack_mode().
Signed-off-by: Yuchung Cheng
Signed-off-by: Neal Cardwell
Signed-off-by: Wei Wang
Signed-off-by: Eric Dumazet
---
net/ipv4/tcp_dctcp.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4
protocol states:
1) When a hole is repaired
2) When CE status changes between subsequent data packets received
3) When a data packet carries CWR flag
Yuchung Cheng (4):
tcp: mandate a one-time immediate ACK
tcp: avoid resetting ACK timer in DCTCP
tcp: always ACK immediately on hole repairs
tcp
tcp_enter_quickack_mode() because we do
not want to forget the icsk_ack.pingpong or icsk_ack.ato state.
Signed-off-by: Yuchung Cheng
Signed-off-by: Neal Cardwell
Signed-off-by: Wei Wang
Signed-off-by: Eric Dumazet
---
include/net/inet_connection_sock.h | 3 ++-
net/ipv4/tcp_input.c | 4
gt; >> Signed-off-by: Lawrence Brakmo
> >> ---
> >> net/ipv4/tcp_input.c | 9 -
> >> 1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > Seems like a nice mechanism to have, IMHO.
> >
> > Acked-by: Neal Cardwell
>
> Should t
on in tcp_send_ack and check the
actual ack sequence before cancelling the delayed ACK. Further it's
safer to pass the ack sequence number as a local variable into
tcp_send_ack routine, instead of intercepting tp->rcv_nxt to avoid
future bugs like this.
Reported-by: Neal Cardwell
Signed-off-by:
, ..., 1) = 1
0.200 > [ect01] P. 1:2(1) ack 1001
0.200 < [ect0] . 1001:2001(1000) ack 2 win 257
+0.005 < [ce] . 2001:3001(1000) ack 2 win 257
+0.000 > [ect01] . 2:2(0) ack 2001
// Previously the ACK below would be delayed by 40ms
+0.000 > [ect01] E. 2:2(0) ack 3001
+0.500 < F.
Refactor and create helpers to send the special ACK in DCTCP.
Signed-off-by: Yuchung Cheng
Acked-by: Neal Cardwell
---
net/ipv4/tcp_output.c | 22 +-
1 file changed, 17 insertions(+), 5 deletions(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index
This patch set address that the existing DCTCP implementation does not
fully implement the ACK policy specified in the RFC. This improves
the responsiveness of CE status change particularly on flows with
small inflight.
Yuchung Cheng (3):
tcp: helpers to send special ack
tcp: do not cancel
After fixing the way DCTCP tracking delayed ACKs, the delayed-ACK
related callbacks are no longer needed
Signed-off-by: Yuchung Cheng
Signed-off-by: Eric Dumazet
Acked-by: Neal Cardwell
---
include/net/tcp.h | 2 --
net/ipv4/tcp_dctcp.c | 25 -
net/ipv4
rything
+0.500 < F. 9501:9501(0) ack 4 win 257
Reported-by: Larry Brakmo
Signed-off-by: Yuchung Cheng
Signed-off-by: Eric Dumazet
Acked-by: Neal Cardwell
---
net/ipv4/tcp_dctcp.c | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/tcp_dctcp.c b/net/i
This patch series addresses the issue that sometimes DCTCP
fail to acknowledge the latest sequence and result in sender timeout
if inflight is small.
Yuchung Cheng (2):
tcp: fix dctcp delayed ACK schedule
tcp: remove DELAYED ACK events in DCTCP
include/net/tcp.h | 2 --
net/ipv4
On Sat, Jul 7, 2018 at 7:07 AM, Neal Cardwell wrote:
> On Sat, Jul 7, 2018 at 7:15 AM David Miller wrote:
>>
>> From: Lawrence Brakmo
>> Date: Tue, 3 Jul 2018 09:26:13 -0700
>>
>> > When have observed high tail latencies when using DCTCP for RPCs as
>> > compared to using Cubic. For example, in
e due to ACK compression or decimation. Algorithms
> may want to use send rates and receive rates as separate signals.
>
> Signed-off-by: Deepti Raghavan
Acked-by: Yuchung Cheng
> ---
> include/net/tcp.h | 2 ++
> net/ipv4/tcp_rate.c | 4
> 2 files changed, 6 insertion
On Mon, Jul 2, 2018 at 2:39 PM, Lawrence Brakmo wrote:
>
> DCTCP depends on the CA_EVENT_NON_DELAYED_ACK and CA_EVENT_DELAYED_ACK
> notifications to keep track if it needs to send an ACK for packets that
> were received with a particular ECN state but whose ACK was delayed.
>
> Under some circumst
-by: Daniele Iamartino
Signed-off-by: Yuchung Cheng
Signed-off-by: Eric Dumazet
Signed-off-by: Neal Cardwell
---
net/ipv4/sysctl_net_ipv4.c | 18 +-
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index
On Wed, Jun 27, 2018 at 1:00 PM, Lawrence Brakmo wrote:
>
>
> From: on behalf of Yuchung Cheng
>
> Date: Wednesday, June 27, 2018 at 9:59 AM
> To: Neal Cardwell
> Cc: Lawrence Brakmo , Matt Mathis ,
> Netdev , Kernel Team , Blake
> Matheny , Alexei Starovoitov
On Wed, Jun 27, 2018 at 8:24 AM, Neal Cardwell wrote:
> On Tue, Jun 26, 2018 at 10:34 PM Lawrence Brakmo wrote:
>> The only issue is if it is safe to always use 2 or if it is better to
>> use min(2, snd_ssthresh) (which could still trigger the problem).
>
> Always using 2 SGTM. I don't think we n
On Tue, Jun 26, 2018 at 8:45 AM, Eric Dumazet wrote:
> Signed-off-by: Eric Dumazet
> ---
nice refactor!
Acked-by: Yuchung Cheng
> net/ipv4/tcp_minisocks.c | 223 ---
> 1 file changed, 113 insertions(+), 110 deletions(-)
>
> dif
nough.
>
> This should reduce the extra load noticed in DCTCP environments,
> after congestion events.
>
> This is part 2 of our effort to reduce pure ACK packets.
>
> Signed-off-by: Eric Dumazet
> ---
Acked-by: Yuchung Cheng
Thanks for this patch. I am still wondering h
44
>> values that this commit hard-coded.
>
>> Signed-off-by: Eric Dumazet
>> ---
>
> Very nice. I like the constants and the min(rcv_rtt, srtt).
>
> Acked-by: Neal Cardwell
Acked-by: Yuchung Cheng
Great work. Hopefully this would save middle-boxes' from handling
TCP-ACK themselves.
>
> Thanks!
>
> neal
On Thu, May 17, 2018 at 9:59 AM, Yuchung Cheng wrote:
> On Thu, May 17, 2018 at 9:41 AM, Neal Cardwell wrote:
>>
>> On Thu, May 17, 2018 at 11:40 AM Eric Dumazet
>> wrote:
>> > On 05/17/2018 08:14 AM, Neal Cardwell wrote:
>> > > Is there a particular mo
On Thu, May 17, 2018 at 9:41 AM, Neal Cardwell wrote:
>
> On Thu, May 17, 2018 at 11:40 AM Eric Dumazet
> wrote:
> > On 05/17/2018 08:14 AM, Neal Cardwell wrote:
> > > Is there a particular motivation for the cap of 127? IMHO 127 ACKs is
> quite
> > > a few to compress. Experience seems to show t
Refactor using a new helper, tcp_timeout_mark_loss(), that marks packets
lost upon RTO.
Signed-off-by: Yuchung Cheng
Signed-off-by: Neal Cardwell
Reviewed-by: Eric Dumazet
Reviewed-by: Soheil Hassas Yeganeh
Reviewed-by: Priyaranjan Jha
---
net/ipv4/tcp_input.c | 50
3*MSS).
Also the minimum reordering window is reduced from 1 msec to 0
to recover quicker on short RTT transfers. Therefore RACK is more
aggressive in marking packets lost during recovery to reduce the
reordering window timeouts.
Signed-off-by: Yuchung Cheng
Signed-off-by: Neal Cardwell
Review
simplifies the RTO code by sharing much of the logic with Fast
Recovery.
Signed-off-by: Yuchung Cheng
Signed-off-by: Neal Cardwell
Reviewed-by: Eric Dumazet
Reviewed-by: Soheil Hassas Yeganeh
Reviewed-by: Priyaranjan Jha
---
include/net/tcp.h | 1 +
net/ipv4/tcp_input.c| 18
start from one packet (with Cubic
congestion control).
This commit was tested in an A/B test with Google web servers,
and showed a reduction of 2% in (spurious) retransmits post
timeout (SlowStartRetrans), and correspondingly reduced DSACKs
(DSACKIgnoredOld) by 7%.
Signed-off-by: Yuchung Cheng
Create and export a new helper tcp_rack_skb_timeout and move tcp_is_rack
to prepare the final RTO change.
Signed-off-by: Yuchung Cheng
Signed-off-by: Neal Cardwell
Reviewed-by: Eric Dumazet
Reviewed-by: Soheil Hassas Yeganeh
Reviewed-by: Priyaranjan Jha
---
include/net/tcp.h | 2
Fast Recovery s.t. the inflight is updated
first before tcp_enter_recovery flips state to CA_Recovery.
2) avoid intertwining loss marking with state update, making the
code more readable.
Signed-off-by: Yuchung Cheng
Signed-off-by: Neal Cardwell
Reviewed-by: Eric Dumazet
Reviewed-by: Soheil
This patch disables RFC6675 loss detection and make sysctl
net.ipv4.tcp_recovery = 1 controls a binary choice between RACK
(1) or RFC6675 (0).
Signed-off-by: Yuchung Cheng
Signed-off-by: Neal Cardwell
Reviewed-by: Eric Dumazet
Reviewed-by: Soheil Hassas Yeganeh
Reviewed-by: Priyaranjan Jha
Linux. It should not to be confused with the Reno
(AIMD) congestion control.
Signed-off-by: Yuchung Cheng
Signed-off-by: Neal Cardwell
Reviewed-by: Eric Dumazet
Reviewed-by: Soheil Hassas Yeganeh
Reviewed-by: Priyaranjan Jha
---
include/net/tcp.h | 1 +
net/ipv4/tcp_input.c| 19
1 - 100 of 385 matches
Mail list logo