From: Neal Cardwell
This commit shrinks inet_connection_sock by 4 bytes, by shrinking
icsk_mtup.enabled from 32 bits to 1 bit, and shrinking
icsk_mtup.probe_size from s32 to an unsuigned 31 bit field.
This is to save space to compensate for the recent introduction of a
new u32 in
From: Neal Cardwell
When cwnd is not a multiple of the TSO skb size of N*MSS, we can get
into persistent scenarios where we have the following sequence:
(1) ACK for full-sized skb of N*MSS arrives
-> tcp_write_xmit() transmit full-sized skb with N*MSS
-> move pacing release time f
From: Neal Cardwell
In the header prediction fast path for a bulk data receiver, if no
data is newly acknowledged then we do not call tcp_ack() and do not
call tcp_ack_update_window(). This means that a bulk receiver that
receives large amounts of data can have the incoming sequence numbers
wrap
From: Neal Cardwell
Simplify tcp_set_congestion_control() by removing the initialization
code path for the !load case.
There are only two call sites for tcp_set_congestion_control(). The
EBPF call site is the only one that passes load=false; it also passes
cap_net_admin=true. Because of that
From: Neal Cardwell
This patch series reorganizes TCP congestion control initialization so that if
EBPF code called by tcp_init_transfer() sets the congestion control algorithm
by calling setsockopt(TCP_CONGESTION) then the TCP stack initializes the
congestion control module immediately, instead
From: Neal Cardwell
Now that the previous patches have removed the code that uses the
flags argument to _bpf_setsockopt(), we can remove that argument.
Signed-off-by: Neal Cardwell
Acked-by: Yuchung Cheng
Acked-by: Kevin Yang
Signed-off-by: Eric Dumazet
Cc: Lawrence Brakmo
---
net/core
From: Neal Cardwell
Now that the previous patch ensures we don't initialize the congestion
control twice, when EBPF sets the congestion control algorithm at
connection establishment we can simplify the code by simply
initializing the congestion control module at that time.
Signed-off-by:
From: Neal Cardwell
Now that the previous patches ensure that all call sites for
tcp_set_congestion_control() want to initialize congestion control, we
can simplify tcp_set_congestion_control() by removing the reinit
argument and the code to support it.
Signed-off-by: Neal Cardwell
Acked-by
From: Neal Cardwell
Change tcp_init_transfer() to only initialize congestion control if it
has not been initialized already.
With this new approach, we can arrange things so that if the EBPF code
sets the congestion control by calling setsockopt(TCP_CONGESTION) then
tcp_init_transfer() will not
From: Neal Cardwell
This patch series reorganizes TCP congestion control initialization so that if
EBPF code called by tcp_init_transfer() sets the congestion control algorithm
by calling setsockopt(TCP_CONGESTION) then the TCP stack initializes the
congestion control module immediately, instead
From: Neal Cardwell
Change tcp_init_transfer() to only initialize congestion control if it
has not been initialized already.
With this new approach, we can arrange things so that if the EBPF code
sets the congestion control by calling setsockopt(TCP_CONGESTION) then
tcp_init_transfer() will not
From: Neal Cardwell
Now that the previous patch ensures we don't initialize the congestion
control twice, when EBPF sets the congestion control algorithm at
connection establishment we can simplify the code by simply
initializing the congestion control module at that time.
Signed-off-by:
From: Neal Cardwell
Simplify tcp_set_congestion_control() by removing the initialization
code path for the !load case.
There are only two call sites for tcp_set_congestion_control(). The
EBPF call site is the only one that passes load=false; it also passes
cap_net_admin=true. Because of that
From: Neal Cardwell
Now that the previous patches have removed the code that uses the
flags argument to _bpf_setsockopt(), we can remove that argument.
Signed-off-by: Neal Cardwell
Acked-by: Yuchung Cheng
Acked-by: Kevin Yang
Signed-off-by: Eric Dumazet
Cc: Lawrence Brakmo
---
net/core
From: Neal Cardwell
Now that the previous patches ensure that all call sites for
tcp_set_congestion_control() want to initialize congestion control, we
can simplify tcp_set_congestion_control() by removing the reinit
argument and the code to support it.
Signed-off-by: Neal Cardwell
Acked-by
On Mon, Oct 21, 2019 at 5:11 PM Jason Baron wrote:
>
>
>
> On 10/21/19 4:36 PM, Eric Dumazet wrote:
> > On Mon, Oct 21, 2019 at 12:53 PM Christoph Paasch wrote:
> >>
> >
> >> Actually, longterm I hope we would be able to get rid of the
> >> blackhole-detection and fallback heuristics. In a far di
On Mon, Oct 21, 2019 at 8:04 PM Subash Abhinov Kasiviswanathan
wrote:
>
> > Interesting! As tcp_input.c summarizes, "packets_out is
> > SND.NXT-SND.UNA counted in packets". In the normal operation of a
> > socket, tp->packets_out should not be 0 if any of those other fields
> > are non-zero.
> >
>
On Sun, Oct 20, 2019 at 10:45 PM Subash Abhinov Kasiviswanathan
wrote:
>
> > FIN-WAIT1 just means the local application has called close() or
> > shutdown() to shut down the sending direction of the socket, and the
> > local TCP stack has sent a FIN, and is waiting to receive a FIN and an
> > ACK
On Sun, Oct 20, 2019 at 7:15 PM Subash Abhinov Kasiviswanathan
wrote:
>
> > Hmm. Random related thought while searching for a possible cause: I
> > wonder if tcp_write_queue_purge() should clear tp->highest_sack (and
> > possibly tp->sacked_out)? The tcp_write_queue_purge() code is careful
> > to
tcp_write_queue_purgeOn Sun, Oct 20, 2019 at 4:25 PM Subash Abhinov
Kasiviswanathan wrote:
>
> We are seeing a crash in the TCP ACK codepath often in our regression
> racks with an ARM64 device with 4.19 based kernel.
>
> It appears that the tp->highest_ack is invalid when being accessed when
> a
other failures, such as SYN/ACK + data being dropped, will result in the
> connection not becoming established. And a connection blackhole after
> session establishment shows up as a stalled connection.
>
> Signed-off-by: Jason Baron
> Cc: Eric Dumazet
> Cc: Neal Cardwell
> Cc:
On Tue, Sep 17, 2019 at 1:22 PM Eric Dumazet wrote:
>
> Tue, Sep 17, 2019 at 10:13 AM Jason Baron wrote:
> >
> >
> > Hi,
> >
> > I was interested in adding a field to tcp_info around the TFO state of a
> > socket. So for the server side it would indicate if TFO was used to
> > create the socket
On Fri, Sep 13, 2019 at 7:23 PM Thomas Higdon wrote:
>
> Neal Cardwell mentioned that snd_wnd would be useful for diagnosing TCP
> performance problems --
> > (1) Usually when we're diagnosing TCP performance problems, we do so
> > from the sender, since th
_INFO, and
> has the same name.
>
> Also note that we avoid increasing the size of the tcp_sock struct by
> taking advantage of a hole.
>
> Signed-off-by: Thomas Higdon
> ---
> changes since v4:
> - optimize placement of rcv_ooopack to avoid increasing tcp_sock struct
>
On Fri, Sep 13, 2019 at 5:29 PM Yuchung Cheng wrote:
> > What if the comment is shortened up to fit in 80 columns and the units
> > (bytes) are added, something like:
> >
> > __u32 tcpi_snd_wnd;/* peer's advertised recv window
> > (bytes) */
> just a thought: will tcpi_peer_rcv_
On Fri, Sep 13, 2019 at 3:36 PM Thomas Higdon wrote:
>
> Neal Cardwell mentioned that snd_wnd would be useful for diagnosing TCP
> performance problems --
> > (1) Usually when we're diagnosing TCP performance problems, we do so
> > from the sender, since th
On Fri, Sep 13, 2019 at 3:37 PM Thomas Higdon wrote:
>
> For receive-heavy cases on the server-side, we want to track the
> connection quality for individual client IPs. This counter, similar to
> the existing system-wide TCPOFOQueue counter in /proc/net/netstat,
> tracks out-of-order packet recep
On Fri, Sep 13, 2019 at 10:29 AM Thomas Higdon wrote:
>
> On Thu, Sep 12, 2019 at 10:14:33AM +0100, Dave Taht wrote:
> > On Thu, Sep 12, 2019 at 1:59 AM Neal Cardwell wrote:
> > >
> > > On Wed, Sep 11, 2019 at 6:32 PM Thomas Higdon wrote:
> > > >
&g
On Wed, Sep 11, 2019 at 6:32 PM Thomas Higdon wrote:
>
> Neal Cardwell mentioned that rcv_wnd would be useful for helping
> diagnose whether a flow is receive-window-limited at a given instant.
>
> This serves the purpose of adding an additional __u32 to avoid the
> would-be
ivers.
>
> It has been used at Google for about four years,
> and has been discussed at various networking conferences.
>
> [1] segments smaller than MSS already have PSH flag set
> by tcp_sendmsg() / tcp_mark_push(), unless MSG_MORE
> has been requested by the user.
>
On Tue, Sep 10, 2019 at 4:39 PM Eric Dumazet wrote:
>
> On Tue, Sep 10, 2019 at 10:11 PM Thomas Higdon wrote:
> >
> >
> ...
> > Because an additional 32-bit member in struct tcp_info would cause
> > a hole on 64-bit systems, we reserve a struct member '_reserved'.
> ...
> > diff --git a/include/u
p;
remove it")
Signed-off-by: Neal Cardwell
Acked-by: Yuchung Cheng
Acked-by: Soheil Hassas Yeganeh
Cc: Eric Dumazet
---
net/ipv4/tcp_input.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c21e8a22fb3b..8a1cd93
and
> call sk->sk_write_space(sk) accordingly.
>
> Fixes: ce5ec440994b ("tcp: ensure epoll edge trigger wakeup when write queue
> is empty")
> Signed-off-by: Eric Dumazet
> Cc: Jason Baron
> Reported-by: Vladimir Rutsky
> Cc: Soheil Hassas Yeganeh
>
o renames the do_nonblock label since we might reach this
> code path even if we were in blocking mode.
>
> Fixes: 790ba4566c1a ("tcp: set SOCK_NOSPACE under memory pressure")
> Signed-off-by: Eric Dumazet
> Cc: Jason Baron
> Reported-by: Vladimir Rutsky
> ---
ntrol the floor of MSS probing.
>
> Signed-off-by: Josh Hunt
> ---
Acked-by: Neal Cardwell
Thanks, Josh. I agree with Eric that it would be great if you are able
to share the value that you have found to work well.
neal
On Thu, Aug 8, 2019 at 2:13 AM Eric Dumazet wrote:
> On 8/8/19 1:52 AM, Josh Hunt wrote:
> > TCP_BASE_MSS is used as the default initial MSS value when MTU probing is
> > enabled. Update the comment to reflect this.
> >
> > Suggested-by: Neal Cardwell
&
On Fri, Aug 2, 2019 at 3:03 PM Bernd wrote:
>
> Hello,
>
> While analyzing a aborted upload packet capture I came across a odd
> trace where a sender was not responding to a duplicate SACK but
> sending further segments until it stalled.
>
> Took me some time until I remembered this fix, and actua
On Sun, Jul 28, 2019 at 5:14 PM Josh Hunt wrote:
>
> On 7/28/19 6:54 AM, Eric Dumazet wrote:
> > On Sun, Jul 28, 2019 at 1:21 AM Josh Hunt wrote:
> >>
> >> On 7/27/19 12:05 AM, Eric Dumazet wrote:
> >>> On Sat, Jul 27, 2019 at 4:23 AM Josh Hunt wrote:
>
> The current implementation of
b21c7c16 ("bpf: Add support for changing congestion control")
> Signed-off-by: Eric Dumazet
> Cc: Lawrence Brakmo
> Reported-by: Neal Cardwell
> ---
Acked-by: Neal Cardwell
Thanks, Eric!
neal
On Sat, Jul 6, 2019 at 2:19 PM Carlo Wood wrote:
>
> While investigating this further, I read on
> http://www.masterraghu.com/subjects/np/introduction/unix_network_programming_v1.3/ch07lev1sec5.html
> under "SO_RCVBUF and SO_SNDBUF Socket Options":
>
> When setting the size of the TCP socket r
On Tue, Jun 11, 2019 at 2:46 AM Zhongjie Wang wrote:
>
> Hi Neal,
>
> Thanks for your valuable feedback! Yes, I think you are right.
> It seems not a problem if tp->urg_data and tp->urg_seq are used together.
> From our test results, we can only see there are some paths requiring
> specific initia
On Mon, Jun 10, 2019 at 7:48 PM Zhongjie Wang wrote:
>
> Hi Neal,
>
> Thanks for your reply. Sorry, I made a mistake in my previous email.
> After I double checked the source code, I think it should be tp->urg_seq,
> which is used before assignment, instead of tp->copied_seq.
> Still in the same i
On Sun, Jun 9, 2019 at 11:12 PM Zhongjie Wang wrote:
...
> It compares tp->copied_seq with tcp->rcv_nxt.
> However, tp->copied_seq is only assigned to an appropriate sequence number
> when
> it copies data to user space. So here tp->copied_seq could be equal to 0,
> which is its initial value, if
ntly, calling it only once per RTT.
>
> Signed-off-by: Eric Dumazet
> Cc: Yuchung Cheng
> Cc: Neal Cardwell
> Cc: Soheil Hassas Yeganeh
> Cc: Florian Westphal
> Cc: Daniel Borkmann
> Cc: Lawrence Brakmo
> Cc: Abdul Kabbani
> ---
Thanks, Eric!
There is a slight
On Wed, Apr 3, 2019 at 8:13 PM brakmo wrote:
>
> When a packet is dropped when calling queue_xmit in __tcp_transmit_skb
> and packets_out is 0, it is beneficial to set a small probe timer.
> Otherwise, the throughput for the flow can suffer because it may need to
> depend on the probe timer to st
On Sat, Feb 23, 2019 at 6:51 PM Eric Dumazet wrote:
>
> syzbot reported a WARN_ON(!tcp_skb_pcount(skb))
> in tcp_send_loss_probe() [1]
>
> This was caused by TCP_REPAIR sent skbs that inadvertenly
> were missing a call to tcp_init_tso_segs()
>
Acked-by: Neal Cardwell
Thanks, Eric!
neal
Signed-off-by: Eric Dumazet
> Reported-by: soukjin bae
> ---
> net/ipv4/tcp_ipv4.c | 5 -
> 1 file changed, 4 insertions(+), 1 deletion(-)
Acked-by: Neal Cardwell
Thanks!
neal
2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
Acked-by: Neal Cardwell
Thanks!
neal
commit b701a99e431d ("tcp: Add
> tcp_clamp_rto_to_user_timeout() helper to improve accuracy"), but
> predates git history.
>
> Signed-off-by: Eric Dumazet
> Acked-by: Soheil Hassas Yeganeh
> ---
Acked-by: Neal Cardwell
Thanks!
neal
se on a receiver
> without GRO, but the spectacular gain is really on
> 1000x release_sock() latency reduction I have measured.
>
> Signed-off-by: Eric Dumazet
> Cc: Neal Cardwell
> Cc: Yuchung Cheng
> ---
Acked-by: Neal Cardwell
Thanks!
neal
t; account how many ACK were coalesced, this information
> will be available in skb_shinfo(skb)->gso_segs
>
> Signed-off-by: Eric Dumazet
> ---
Acked-by: Neal Cardwell
Thanks!
neal
se on a receiver
> without GRO, but the spectacular gain is really on
> 1000x release_sock() latency reduction I have measured.
>
> Signed-off-by: Eric Dumazet
> Cc: Neal Cardwell
> Cc: Yuchung Cheng
> ---
...
> + if (TCP_SKB_CB(tail)->end_seq != TCP_SKB_CB(skb)-&g
situation.
>
> Reported-by: Jean-Louis Dupond
> Signed-off-by: Eric Dumazet
> ---
Acked-by: Neal Cardwell
Nice. Thanks!
neal
On Tue, Nov 27, 2018 at 10:57 AM Eric Dumazet wrote:
>
> Neal pointed out that non sack flows might suffer from ACK compression
> added in the following patch ("tcp: implement coalescing on backlog queue")
>
> Instead of tweaking tcp_add_backlog() we can take into
> account how many ACK were coale
On Tue, Nov 27, 2018 at 10:57 AM Eric Dumazet wrote:
>
> Tell the compiler that most TCP flows are using SACK these days.
>
> There is no need to add the unlikely() clause in tcp_is_reno(),
> the compiler is able to infer it.
>
> Signed-off-by: Eric Dumazet
> ---
Acked-
On Wed, Nov 21, 2018 at 12:52 PM Eric Dumazet wrote:
>
> In case GRO is not as efficient as it should be or disabled,
> we might have a user thread trapped in __release_sock() while
> softirq handler flood packets up to the point we have to drop.
>
> This patch balances work done from user thread
reduces bursts for HZ=100 or HZ=250 kernels, making TCP
> behavior more uniform.
>
> Signed-off-by: Eric Dumazet
> Acked-by: Soheil Hassas Yeganeh
> ---
Nice. Thanks!
Acked-by: Neal Cardwell
neal
cs to avoid overflows.
>
> Signed-off-by: Eric Dumazet
> Acked-by: Soheil Hassas Yeganeh
> ---
> net/ipv4/tcp_output.c | 7 ---
> 1 file changed, 4 insertions(+), 3 deletions(-)
Thanks!
Acked-by: Neal Cardwell
neal
t;
> Signed-off-by: Eric Dumazet
> Acked-by: Soheil Hassas Yeganeh
> ---
> net/ipv4/tcp_output.c | 4
> 1 file changed, 4 insertions(+)
Thanks!
Acked-by: Neal Cardwell
neal
t fq ce_threshold 2.5ms
>
> Signed-off-by: Eric Dumazet
> ---
Very nice! Thanks, Eric. :-)
Acked-by: Neal Cardwell
neal
te an old comment to reflect the new approach.
Signed-off-by: Neal Cardwell
Signed-off-by: Yuchung Cheng
Signed-off-by: Soheil Hassas Yeganeh
Signed-off-by: Eric Dumazet
---
net/ipv4/tcp_bbr.c | 15 +++
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/net/ipv4/
Centralize the code that sets gains used for computing cwnd and pacing
rate. This simplifies the code and makes it easier to change the state
machine or (in the future) dynamically change the gain values and
ensure that the correct gain values are always used.
Signed-off-by: Neal Cardwell
Signed
The second patch adjusts the TCP BBR logic to centralize the setting
of gain values, to simplify the code and prepare for future changes.
Neal Cardwell (2):
tcp_bbr: adjust TCP BBR for departure time pacing
tcp_bbr: centralize code to set gains
net/ipv4/tcp_bbr.c | 77
nt
o if pushing in_network down (pacing_gain < 1.0),
then in_network goes below target upon an ACK event
This commit changes the BBR state machine to use this estimated
"packets in network" value to make its decisions.
Signed-off-by: Neal Cardwell
Signed-off-by: Yuchung Cheng
Signed
On Sat, Sep 8, 2018 at 11:23 AM Ttttabcd wrote:
>
> Thank you very much for your previous answer, sorry for the inconvenience.
>
> But now I want to ask you one more question.
>
> The question is why we need two variables to control the syn queue?
>
> The first is the "backlog" parameter of the "l
On Tue, Sep 4, 2018 at 1:48 AM Ttttabcd wrote:
>
> Hello everyone,recently I am looking at the source code for handling TCP
> three-way handshake(Linux Kernel version 4.18.5).
>
> I found some strange places in the source code for handling syn messages.
>
> in the function "tcp_conn_request"
>
>
ngestion control")
Signed-off-by: Neal Cardwell
Acked-by: Yuchung Cheng
Acked-by: Soheil Hassas Yeganeh
Acked-by: Priyaranjan Jha
Reviewed-by: Eric Dumazet
---
net/ipv4/tcp_bbr.c | 4
1 file changed, 4 insertions(+)
diff --git a/net/ipv4/tcp_bbr.c b/net/ipv4/tcp_bbr.c
index
On Tue, Jul 24, 2018 at 1:42 PM Lawrence Brakmo wrote:
>
> Note that without this fix the 99% latencies when doing 10KB RPCs
> in a congested network using DCTCP are 40ms vs. 190us with the patch.
> Also note that these 40ms high tail latencies started after commit
> 3759824da87b30ce7a35b4873b62b0
On Tue, Jul 24, 2018 at 1:07 PM Yuchung Cheng wrote:
>
> On Mon, Jul 23, 2018 at 7:23 PM, Daniel Borkmann wrote:
> > Should this go to net tree instead where all the other fixes went?
> I am neutral but this feels more like a feature improvement
I agree this feels like a feature improvement rath
.
> Modified based on comments by Neal Cardwell
>
> Signed-off-by: Lawrence Brakmo
> ---
> net/ipv4/tcp_input.c | 9 -
> 1 file changed, 8 insertions(+), 1 deletion(-)
Seems like a nice mechanism to have, IMHO.
Acked-by: Neal Cardwell
Thanks!
neal
nding. It does seem to be showing up in patchwork now:
https://patchwork.ozlabs.org/patch/941532/
And I can confirm I'm able to apply it to net-next.
Acked-by: Neal Cardwell
thanks,
neal
p 99.9%
> > 1MB RPCs2.6ms 5.5ms 43ms 208ms
> > 10KB RPCs1.1ms 1.3ms 53ms 212ms
> ...
> > v2: Removed call to tcp_ca_event from tcp_send_ack since I added one in
> > tcp_event_ack_sent. Based on Neal Cardwell
> >
On Tue, Jul 3, 2018 at 11:10 AM Lawrence Brakmo wrote:
>
> On 7/2/18, 5:52 PM, "netdev-ow...@vger.kernel.org on behalf of Neal Cardwell"
> wrote:
>
> On Mon, Jul 2, 2018 at 5:39 PM Lawrence Brakmo wrote:
> >
> > When have observed high tail l
On Mon, Jul 2, 2018 at 7:49 PM Yuchung Cheng wrote:
>
> On Mon, Jul 2, 2018 at 2:39 PM, Lawrence Brakmo wrote:
> >
> > DCTCP depends on the CA_EVENT_NON_DELAYED_ACK and CA_EVENT_DELAYED_ACK
> > notifications to keep track if it needs to send an ACK for packets that
> > were received with a partic
e current packet should be enough.
This should reduce the extra load noticed in DCTCP environments,
after congestion events.
This is part 2 of our effort to reduce pure ACK packets.
Signed-off-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
Acked-by:
On Fri, Jun 29, 2018 at 9:48 PM Lawrence Brakmo wrote:
>
> DCTCP depends on the CA_EVENT_NON_DELAYED_ACK and CA_EVENT_DELAYED_ACK
> notifications to keep track if it needs to send an ACK for packets that
> were received with a particular ECN state but whose ACK was delayed.
>
> Under some circumst
On Sat, Jun 30, 2018 at 9:47 PM Lawrence Brakmo wrote:
> I see two issues, one is that entering quickack mode as you
> mentioned does not insure that it will still be on when the CWR
> arrives. The second issue is that the problem occurs right after the
> receiver sends a small reply which results
one ACK later (when we get an ACK that doesn't cover a
retransmit). But that seems fine to me.
I also cooked the new packetdrill test below to explicitly cover this
case you are addressing (please let me know if you have an alternate
suggestion).
Tested-by: Neal Cardwell
Acked-by: Nea
On Fri, Jun 29, 2018 at 9:48 PM Lawrence Brakmo wrote:
>
> When have observed high tail latencies when using DCTCP for RPCs as
> compared to using Cubic. For example, in one setup there are 2 hosts
> sending to a 3rd one, with each sender having 3 flows (1 stream,
> 1 1MB back-to-back RPCs and 1 1
On Fri, Jun 29, 2018 at 9:48 PM Lawrence Brakmo wrote:
>
> We observed high 99 and 99.9% latencies when doing RPCs with DCTCP. The
> problem is triggered when the last packet of a request arrives CE
> marked. The reply will carry the ECE mark causing TCP to shrink its cwnd
> to 1 (because there ar
On Fri, Jun 29, 2018 at 6:07 AM Ilpo Järvinen wrote:
>
> If SACK is not enabled and the first cumulative ACK after the RTO
> retransmission covers more than the retransmitted skb, a spurious
> FRTO undo will trigger (assuming FRTO is enabled for that RTO).
> The reason is that any non-retransmitte
On Thu, Jun 28, 2018 at 4:20 PM Lawrence Brakmo wrote:
>
> I just looked at 4.18 traces and the behavior is as follows:
>
>Host A sends the last packets of the request
>
>Host B receives them, and the last packet is marked with congestion (CE)
>
>Host B sends ACKs for packets not marke
d-off-by: Eric Dumazet
> Reported-by: Neal Cardwell
> Cc: Lawrence Brakmo
> ---
> net/ipv4/tcp_input.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
Acked-by: Neal Cardwell
Thanks, Eric!
neal
On Tue, Jun 26, 2018 at 10:34 PM Lawrence Brakmo wrote:
> The only issue is if it is safe to always use 2 or if it is better to
> use min(2, snd_ssthresh) (which could still trigger the problem).
Always using 2 SGTM. I don't think we need min(2, snd_ssthresh), as
that should be the same as just 2
On Tue, Jun 26, 2018 at 11:46 AM Eric Dumazet wrote:
>
> Signed-off-by: Eric Dumazet
> ---
> net/ipv4/tcp_minisocks.c | 223 ---
> 1 file changed, 113 insertions(+), 110 deletions(-)
Yes, very nice clean-up! Thanks for doing this.
Acked-by
On Tue, May 29, 2018 at 11:45 AM Marcelo Ricardo Leitner <
marcelo.leit...@gmail.com> wrote:
> - patch2 - fix rtx attack vector
>- Add the floor value to rto_min to HZ/20 (which fits the values
> that Michael shared on the other email)
I would encourage allowing minimum RTO values down to
On Tue, May 22, 2018 at 8:31 PM kbuild test robot wrote:
> Hi Eric,
> Thank you for the patch! Yet something to improve:
> [auto build test ERROR on net/master]
> [also build test ERROR on v4.17-rc6 next-20180517]
> [cannot apply to net-next/master]
> [if your patch is applied to the wrong git
ugh.
> This should reduce the extra load noticed in DCTCP environments,
> after congestion events.
> This is part 2 of our effort to reduce pure ACK packets.
> Signed-off-by: Eric Dumazet
> ---
Acked-by: Neal Cardwell
Thanks!
neal
On Mon, May 21, 2018 at 6:09 PM Eric Dumazet wrote:
> We want to add finer control of the number of ACK packets sent after
> ECN events.
> This patch is not changing current behavior, it only enables following
> change.
> Signed-off-by: Eric Dumazet
> ---
Acked-by: Neal
On Thu, May 17, 2018 at 5:47 PM Eric Dumazet wrote:
> This per netns sysctl allows for TCP SACK compression fine-tuning.
> This limits number of SACK that can be compressed.
> Using 0 disables SACK compression.
> Signed-off-by: Eric Dumazet
> ---
Acked-by: Neal Cardwell
Thanks!
neal
On Thu, May 17, 2018 at 5:47 PM Eric Dumazet wrote:
> This per netns sysctl allows for TCP SACK compression fine-tuning.
> Its default value is 1,000,000, or 1 ms to meet TSO autosizing period.
> Signed-off-by: Eric Dumazet
> ---
Acked-by: Neal Cardwell
Thanks!
neal
counter is added in the following patch.
> Two other patches add sysctls to allow changing the 1,000,000 and 44
> values that this commit hard-coded.
> Signed-off-by: Eric Dumazet
> ---
Very nice. I like the constants and the min(rcv_rtt, srtt).
Acked-by: Neal Cardwell
Thanks!
neal
On Thu, May 17, 2018 at 11:40 AM Eric Dumazet
wrote:
> On 05/17/2018 08:14 AM, Neal Cardwell wrote:
> > Is there a particular motivation for the cap of 127? IMHO 127 ACKs is
quite
> > a few to compress. Experience seems to show that it works well to have
one
> > GRO A
On Thu, May 17, 2018 at 8:12 AM Eric Dumazet wrote:
> When TCP receives an out-of-order packet, it immediately sends
> a SACK packet, generating network load but also forcing the
> receiver to send 1-MSS pathological packets, increasing its
> RTX queue length/depth, and thus processing time.
> W
compression or losses.
> We plan to add SACK compression in the following patch, we
> must therefore not call tcp_enter_quickack_mode()
> Signed-off-by: Eric Dumazet
> ---
Acked-by: Neal Cardwell
Thanks!
neal
0.0
> Signed-off-by: Eric Dumazet
> ---
Acked-by: Neal Cardwell
Thanks!
neal
On Thu, May 17, 2018 at 8:12 AM Eric Dumazet wrote:
> Socket can not disappear under us.
> Signed-off-by: Eric Dumazet
> ---
Acked-by: Neal Cardwell
Thanks!
neal
seq.
> This patch also replaces the BUG() by a less intrusive WARN_ON_ONCE()
> kernel BUG at net/ipv4/tcp_output.c:2837!
...
> Fixes: cf60af03ca4e ("net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN)")
> Signed-off-by: Eric Dumazet
> Cc: Yuchung Cheng
> Cc: Neal Cardwell
retransmit queue")
> Signed-off-by: Eric Dumazet
> Reported-by: Michael Wenig
> Tested-by: Michael Wenig
> ---
Acked-by: Neal Cardwell
Nice. Thanks, Eric!
neal
arting).
This commit is a stable candidate for kernels back as far as 4.9.
Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Neal Cardwell
Signed-off-by: Yuchung Cheng
Signed-off-by: Soheil Hassas Yeganeh
Signed-off-by: Priyaranjan Jha
Signed-off-by: Yousuk
1 - 100 of 394 matches
Mail list logo