thanks to coalescing done on backlog, but cleans the 16 skbs
> found in rtx rb-tree.
>
> Reported-by: Soheil Hassas Yeganeh
> Signed-off-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
Thank you very much, Eric!
> ---
> net/ipv4/tcp.c | 11 ++-
> 1 file changed,
ry/common.c:168 [inline]
> prepare_exit_to_usermode+0x39d/0x4d0 arch/x86/entry/common.c:199
> syscall_return_slowpath+0x90/0x5c0 arch/x86/entry/common.c:279
> do_syscall_64+0xe2/0xf0 arch/x86/entry/common.c:305
> entry_SYSCALL_64_after_hwframe+0x63/0xe7
>
> Fixes: 336c39a031
From: Soheil Hassas Yeganeh
Signed-off-by: Soheil Hassas Yeganeh
Signed-off-by: Yuchung Cheng
Signed-off-by: Willem de Bruijn
Reviewed-by: Eric Dumazet
Reviewed-by: Neal Cardwell
---
tools/testing/selftests/net/Makefile | 3 +-
tools/testing/selftests/net/tcp_inq.c | 189
From: Soheil Hassas Yeganeh
Applications with many concurrent connections, high variance
in receive queue length and tight memory bounds cannot
allocate worst-case buffer size to drain sockets. Knowing
the size of receive queue length, applications can optimize
how they allocate buffers to read
On Fri, Apr 27, 2018 at 2:50 PM, Soheil Hassas Yeganeh
wrote:
> From: Soheil Hassas Yeganeh
>
> Signed-off-by: Soheil Hassas Yeganeh
> Signed-off-by: Yuchung Cheng
> Signed-off-by: Willem de Bruijn
> Reviewed-by: Eric Dumazet
> Reviewed-by: Neal Cardwell
Really sorry
From: Soheil Hassas Yeganeh
Applications with many concurrent connections, high variance
in receive queue length and tight memory bounds cannot
allocate worst-case buffer size to drain sockets. Knowing
the size of receive queue length, applications can optimize
how they allocate buffers to read
From: Soheil Hassas Yeganeh
Signed-off-by: Soheil Hassas Yeganeh
Signed-off-by: Yuchung Cheng
Signed-off-by: Willem de Bruijn
Reviewed-by: Eric Dumazet
Reviewed-by: Neal Cardwell
---
tools/testing/selftests/net/Makefile | 3 +-
tools/testing/selftests/net/tcp_inq.c | 189
On Mon, Apr 30, 2018 at 11:43 AM, Eric Dumazet wrote:
> On 04/30/2018 08:38 AM, David Miller wrote:
>> From: Soheil Hassas Yeganeh
>> Date: Fri, 27 Apr 2018 14:57:32 -0400
>>
>>> Since the socket lock is not held when calculating the size of
>>> receive que
On Mon, Apr 30, 2018 at 12:10 PM, David Miller wrote:
> From: Eric Dumazet
> Date: Mon, 30 Apr 2018 09:01:47 -0700
>
>> TCP sockets are read by a single thread really (or synchronized
>> threads), or garbage is ensured, regardless of how the kernel
>> ensures locking while reporting "queue length
From: Soheil Hassas Yeganeh
Applications with many concurrent connections, high variance
in receive queue length and tight memory bounds cannot
allocate worst-case buffer size to drain sockets. Knowing
the size of receive queue length, applications can optimize
how they allocate buffers to read
From: Soheil Hassas Yeganeh
Signed-off-by: Soheil Hassas Yeganeh
Signed-off-by: Yuchung Cheng
Signed-off-by: Willem de Bruijn
Reviewed-by: Eric Dumazet
Reviewed-by: Neal Cardwell
---
tools/testing/selftests/net/Makefile | 3 +-
tools/testing/selftests/net/tcp_inq.c | 189
On Tue, May 1, 2018 at 2:34 PM, David Miller wrote:
> From: Soheil Hassas Yeganeh
> Date: Tue, 1 May 2018 10:11:27 -0400
>
>> +static inline int tcp_inq_hint(struct sock *sk)
>
> Please do not use 'inline' in foo.c files, let the compiler decide.
>
> Otherw
From: Soheil Hassas Yeganeh
Signed-off-by: Soheil Hassas Yeganeh
Signed-off-by: Yuchung Cheng
Signed-off-by: Willem de Bruijn
Reviewed-by: Eric Dumazet
Reviewed-by: Neal Cardwell
---
tools/testing/selftests/net/Makefile | 3 +-
tools/testing/selftests/net/tcp_inq.c | 189
From: Soheil Hassas Yeganeh
Applications with many concurrent connections, high variance
in receive queue length and tight memory bounds cannot
allocate worst-case buffer size to drain sockets. Knowing
the size of receive queue length, applications can optimize
how they allocate buffers to read
On Wed, May 2, 2018 at 11:25 PM, Eric Dumazet wrote:
> Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue")
> Signed-off-by: Eric Dumazet
> Reported-by: Michael Wenig
> Tested-by: Michael Wenig
Acked-by: Soheil Hassas Yeganeh
Thank you for catching and fixing this!
n and changes mmap()
> behavior.
>
> Second patch changes tcp_mmap reference program.
>
> v2:
> Added a missing page align of zc->length in tcp_zerocopy_receive()
> Properly clear zc->recv_skip_hint in case user request was completed.
Acked-by: Soheil Hassas Yeganeh
page
> aligned.
>
> Signed-off-by: Eric Dumazet
> Cc: Andy Lutomirski
> Cc: Soheil Hassas Yeganeh
Acked-by: Soheil Hassas Yeganeh
Thank you, again!
> ---
> tools/testing/selftests/net/tcp_mmap.c | 64 +++---
> 1 file changed, 37 insertions(+), 27 d
that memcg might require additional changes.
>
> Fixes: 93ab6cc69162 ("tcp: implement mmap() for zero copy receive")
> Signed-off-by: Eric Dumazet
> Reported-by: syzbot
> Suggested-by: Andy Lutomirski
> Cc: linux...@kvack.org
> Cc: Soheil Hassas Yeganeh
ted-by: Eric Dumazet
> Cc: Eric Dumazet
> Cc: Priyaranjan Jha
> Cc: Yuchung Cheng
> Cc: Soheil Hassas Yeganeh
Acked-by: Soheil Hassas Yeganeh
Thank you for the nice patch series!
> Stanislav Fomichev (8):
> bpf: add BPF_CGROUP_SOCK_OPS callback that is executed on every RTT
&g
o renames the do_nonblock label since we might reach this
> code path even if we were in blocking mode.
>
> Fixes: 790ba4566c1a ("tcp: set SOCK_NOSPACE under memory pressure")
> Signed-off-by: Eric Dumazet
> Cc: Jason Baron
> Reported-by: Vladimir Rutsky
Ac
and
> call sk->sk_write_space(sk) accordingly.
>
> Fixes: ce5ec440994b ("tcp: ensure epoll edge trigger wakeup when write queue
> is empty")
> Signed-off-by: Eric Dumazet
> Cc: Jason Baron
> Reported-by: Vladimir Rutsky
> Cc: Soheil Hassas Yeganeh
> Cc: N
rnel/sched/idle.c:353
> start_secondary+0x404/0x5c0 arch/x86/kernel/smpboot.c:271
> secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243
> Kernel Offset: disabled
> Rebooting in 86400 seconds..
>
> Fixes: 79861919b889 ("tcp: fix TCP_REPAIR xmit queue setup")
>
On Tue, Feb 26, 2019 at 5:55 PM Willem de Bruijn
wrote:
>
> From: Willem de Bruijn
>
> Signed-off-by: Willem de Bruijn
Acked-by: Soheil Hassas Yeganeh
> ---
> tools/testing/selftests/bpf/bpf_helpers.h | 29 +++
> 1 file changed, 29 insertions(+)
&
From: Soheil Hassas Yeganeh
Returning 0 as inq to userspace indicates there is no more data to
read, and the application needs to wait for EPOLLIN. For a connection
that has received FIN from the remote peer, however, the application
must continue reading until getting EOF (return value of 0
On Wed, Mar 20, 2019 at 10:50 AM Willem de Bruijn
wrote:
>
> From: Willem de Bruijn
>
> Sync include/uapi/linux/bpf.h with tools/
>
> Signed-off-by: Willem de Bruijn
Acked-by: Soheil Hassas Yeganeh
> ---
> tools/include/uapi/linux/bpf.h | 22
so that it can be freed by the cpu feeding the incoming packets in BH.
>
> This increased the performance of small RPC benchmark by about 10 % on a host
> with 112 hyperthreads.
>
> Eric Dumazet (3):
> net: convert rps_needed and rfs_needed to new static branch api
> tcp: add one skb
> > - Really test rps_needed in sk_eat_skb() as claimed.
> > - Fixed rps_needed use in drivers/net/tun.c
> >
> > Eric Dumazet (3):
> > net: convert rps_needed and rfs_needed to new static branch api
> > tcp: add one skb cache for tx
> > tcp: add one skb cache for rx
>
> Acked-by: Willem de Bruijn
Acked-by: Soheil Hassas Yeganeh
Thanks again!
> Thanks Eric!
From: Soheil Hassas Yeganeh
Add documentation to the tcp_ca_state enum, since this enum is
exposed in uapi.
Signed-off-by: Neal Cardwell
Signed-off-by: Yuchung Cheng
Signed-off-by: Eric Dumazet
Signed-off-by: Soheil Hassas Yeganeh
Cc: Sowmini Varadhan
---
include/uapi/linux/tcp.h | 27
r tx")
> Signed-off-by: Eric Dumazet
> Cc: Willem de Bruijn
> Cc: Soheil Hassas Yeganeh
Acked-by: Soheil Hassas Yeganeh
Thank you for the fix!
> ---
> include/net/sock.h | 9 +
> net/ipv4/tcp.c | 2 --
> 2 files changed, 5 insertions(+), 6 d
() instead
> of simply checking if the fast clone has been freed.
>
> Fixes: 472c2e07eef0 ("tcp: add one skb cache for tx")
> Signed-off-by: Eric Dumazet
> Cc: Willem de Bruijn
> Cc: Soheil Hassas Yeganeh
Acked-by: Soheil Hassas Yeganeh
I can't think of other
On Wed, Mar 14, 2018 at 12:32 PM Willem de Bruijn <
willemdebruijn.ker...@gmail.com> wrote:
> On Tue, Mar 13, 2018 at 4:35 PM, Vinicius Costa Gomes
> wrote:
> > Hi,
> >
> > Changes from the RFC:
> > - tweaked commit messages;
> >
> > Original cover letter:
> >
> > This is actually a "bug report"
From: Soheil Hassas Yeganeh
tcp_write_queue_purge clears all the SKBs in the write queue
but does not reset the sk_send_head. As a result, we can have
a NULL pointer dereference anywhere that we use tcp_send_head
instead of the tcp_write_queue_tail.
For example, after 27fid7a8ed38 (tcp: purge
On Mon, Mar 19, 2018 at 10:16 AM Eric Dumazet
wrote:
> On 03/19/2018 07:03 AM, David Miller wrote:
> > From: Eric Dumazet
> > Date: Mon, 19 Mar 2018 05:17:37 -0700
> >
> >> We have sent a fix last week, I am not sure if David took it.
> >>
> >> https://patchwork.ozlabs.org/patch/886324/
> >
>
On Tue, Apr 3, 2018 at 11:19 AM Miroslav Lichvar wrote:
>
> I came across an interesting issue with error messages in sockets with
> enabled timestamping using the SOF_TIMESTAMPING_OPT_CMSG option. When
> the socket is connected and there is an error (e.g. due to destination
> unreachable ICMP), s
From: Soheil Hassas Yeganeh
Clear tp->packets_out when purging the write queue, otherwise
tcp_rearm_rto() mistakenly assumes TCP write queue is not empty.
This results in NULL pointer dereference.
Also, remove the redundant `tp->packets_out = 0` from
tcp_disconnect(), since tcp_disc
From: Soheil Hassas Yeganeh
Clear tp->packets_out when purging the write queue, otherwise
tcp_rearm_rto() mistakenly assumes TCP write queue is not empty.
This results in NULL pointer dereference.
Also, remove the redundant `tp->packets_out = 0` from
tcp_disconnect(), since tcp_disc
From: Soheil Hassas Yeganeh
recvmmsg does not call ___sys_recvmsg when sk_err is set.
That is fine for normal reads but, for MSG_ERRQUEUE, recvmmsg
should always call ___sys_recvmsg regardless of sk->sk_err to
be able to clear error queue. Otherwise, users are not able to
drain the error qu
From: Soheil Hassas Yeganeh
When the connection is reset, there is no point in
keeping the packets on the write queue until the connection
is closed.
RFC 793 (page 70) and RFC 793-bis (page 64) both suggest
purging the write queue upon RST:
https://tools.ietf.org/html/draft-ietf-tcpm-rfc793bis
re needed to fill the pipe when a device has
> suboptimal TSO limits.
> Eric Dumazet (2):
>tcp_bbr: better deal with suboptimal GSO (II)
>tcp_bbr: remove bbr->tso_segs_goal
Acked-by: Soheil Hassas Yeganeh
Thank you, Eric!
> include/net/tcp.h | 6 ++
From: Soheil Hassas Yeganeh
When the connection is aborted, there is no point in
keeping the packets on the write queue until the connection
is closed.
Similar to a27fd7a8ed38 ('tcp: purge write queue upon RST'),
this is essential for a correct MSG_ZEROCOPY implementation,
because
the swtstamp field from struct tcp_skb_cb
>
> Signed-off-by: Eric Dumazet
> Cc: Soheil Hassas Yeganeh
> Cc: Wei Wang
> Cc: Willem de Bruijn
Acked-by: Soheil Hassas Yeganeh
Very nice!
From: Soheil Hassas Yeganeh
We should only record RPS on normal reads and writes.
In single threaded processes, all calls record the same state. In
multi-threaded processes where a separate thread processes
errors, the RFS table mispredicts.
Note that, when CONFIG_RPS is disabled
From: Soheil Hassas Yeganeh
On multi-threaded processes, one common architecture is to have
one (or a small number of) threads polling sockets, and a
considerably larger pool of threads reading form and writing to the
sockets. When we set RPS core on tcp_poll() or udp_poll() we essentially
steer
From: Soheil Hassas Yeganeh
The user-provided value to setsockopt(SO_RCVLOWAT) can be
larger than the maximum possible receive buffer. Such values
mute POLLIN signals on the socket which can stall progress
on the socket.
Limit the user-provided value to half of the maximum receive
buffer, i.e
From: Soheil Hassas Yeganeh
tcp_zerocopy_receive() rounds down the zc->length a multiple of
PAGE_SIZE. This results in two issues:
- tcp_zerocopy_receive sets recv_skip_hint to the length of the
receive queue if the zc->length input is smaller than the
PAGE_SIZE, even though the d
ot;tcp: add one skb cache for tx")
> Signed-off-by: Eric Dumazet
> Cc: Soheil Hassas Yeganeh
> Cc: Willem de Bruijn
Acked-by: Soheil Hassas Yeganeh
Nice catch! Thank you!
> ---
> net/ipv4/tcp.c | 2 ++
> 1 file changed, 2 insertions(+)
&
ntly, calling it only once per RTT.
>
> Signed-off-by: Eric Dumazet
> Cc: Yuchung Cheng
> Cc: Neal Cardwell
> Cc: Soheil Hassas Yeganeh
> Cc: Florian Westphal
> Cc: Daniel Borkmann
> Cc: Lawrence Brakmo
> Cc: Abdul Kabbani
Acked-by: Soheil Hassas Yeganeh
Tha
:/
>
> Fixes: 472c2e07eef0 ("tcp: add one skb cache for tx")
> Signed-off-by: Eric Dumazet
> Cc: Soheil Hassas Yeganeh
> Cc: Willem de Bruijn
Acked-by: Soheil Hassas Yeganeh
Nice catch! Thank you for the fix.
> ---
> net/ipv4/tcp.c | 2 +-
> 1 file changed, 1
From: Soheil Hassas Yeganeh
For EPOLLET, applications must call sendmsg until they get EAGAIN.
Otherwise, there is no guarantee that EPOLLOUT is sent if there was
a failure upon memory allocation.
As a result on high-speed NICs, userspace observes multiple small
sendmsgs after a partial sendmsg
From: Soheil Hassas Yeganeh
If there was any event available on the TCP socket, tcp_poll()
will be called to retrieve all the events. In tcp_poll(), we call
sk_stream_is_writeable() which returns true as long as we are at least
one byte below notsent_lowat. This will result in quite a few
On Thu, Nov 29, 2018 at 10:56 AM Eric Dumazet wrote:
>
> We can remove the loop and conditional branches
> and compute wscale efficiently thanks to ilog2()
>
> Signed-off-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
Very nice, thank you, Eric!
> ---
> net/
TCP is limited by receive
> window")
> Signed-off-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
Excellent catch! Thank you for the fix, Eric!
> ---
> net/ipv4/tcp_output.c | 5 -
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/n
;tcp: enable MSG_ZEROCOPY")
> Reported-by: Marek Majkowski
> Signed-off-by: Willem de Bruijn
> CC: Yuchung Cheng
> CC: Neal Cardwell
> CC: Soheil Hassas Yeganeh
> CC: Alexey Kodanev
Acked-by: Soheil Hassas Yeganeh
Thank you for the fix!
> ---
>
> This is a narro
/ipv4/tcp_minisocks.c | 34 ----------
> > 2 files changed, 20 insertions(+), 35 deletions(-)
> >
> > --
> Entire patch set looks great to me!
>
> Acked-by: Yuchung Cheng
Acked-by: Soheil Hassas Yeganeh
Thank you very much, Eric, for the nice code removal!
> > 2.20.1.321.g9e740568ce-goog
> >
> Signed-off-by: Eric Dumazet
> Cc: Willem de Bruijn
> Cc: Soheil Hassas Yeganeh
Acked-by: Soheil Hassas Yeganeh
Thank you, Eric!
> ---
> drivers/net/loopback.c | 4
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/net/loop
EOR flag.
>
> Both flags should be handled at the same time, after all other
> heuristics have been considered. They both mean that no more bytes
> can be added to this skb by an application.
>
> Signed-off-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
Thank you Eric for the
From: Soheil Hassas Yeganeh
When we have less than PAGE_SIZE of data on receive queue,
we set recv_skip_hint to 0. Instead, set it to the actual
number of bytes available.
Signed-off-by: Soheil Hassas Yeganeh
Signed-off-by: Eric Dumazet
---
net/ipv4/tcp.c | 14 +-
1 file changed
From: Soheil Hassas Yeganeh
When SKBs are coalesced, we can have SKBs with different
frag sizes. Some with PAGE_SIZE and some not with PAGE_SIZE.
Since recv_skip_hint is always set to the full SKB size,
it can overestimate the amount that should be read using
normal read for coalesced packets
If there is no packet in retransmit queue, we should
> > avoid a NULL deref.
> >
> > Signed-off-by: Eric Dumazet
> > Reported-by: soukjin bae
> > ---
> > net/ipv4/tcp_ipv4.c | 5 -
> > 1 file changed, 4 insertions(+), 1 deletion(-)
>
>
compression or losses.
>
> We plan to add SACK compression in the following patch, we
> must therefore not call tcp_enter_quickack_mode()
>
> Signed-off-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
Thank you, Eric!
On Mon, May 21, 2018 at 6:08 PM, Eric Dumazet wrote:
> We want to add finer control of the number of ACK packets sent after
> ECN events.
>
> This patch is not changing current behavior, it only enables following
> change.
>
> Signed-off-by: Eric Dumazet
Acked-by:
nough.
>
> This should reduce the extra load noticed in DCTCP environments,
> after congestion events.
>
> This is part 2 of our effort to reduce pure ACK packets.
>
> Signed-off-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
Thanks for the patch!
ayed ACKs), then
> the TLP timer fires too quickly.
>
> Fixes: df92c8394e6e ("tcp: fix xmit timer to only be reset if data
> ACKed/SACKed")
> Signed-off-by: Neal Cardwell
> Signed-off-by: Yuchung Cheng
> Signed-off-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
Nice fix. Thank you, Neal!
ivers.
>
> It has been used at Google for about four years,
> and has been discussed at various networking conferences.
>
> [1] segments smaller than MSS already have PSH flag set
> by tcp_sendmsg() / tcp_mark_push(), unless MSG_MORE
> has been requested by the user.
>
_wnd, the receive window that the receiver has advertised to
> > > the sender.
> >
> > This serves the purpose of adding an additional __u32 to avoid the
> > would-be hole caused by the addition of the tcpi_rcvi_ooopack field.
> >
> > Signed-off-by: Thomas Higdon
We'd like to announce the availability of transperf: a network
protocol performance testing tool.
transperf enables users to test TCP performance over a variety of
emulated network scenarios (using netem), including RTT, bottleneck
bandwidth, and policed rate that can change over time. The tool
su
m added later (but in same linux
>> version)
>> for rtx-rb-tree to fix the bug.
>>
>> Fixes: e2080072ed2d ("tcp: new list for sent but unacked skbs for RACK
>> recovery")
>> Signed-off-by: Eric Dumazet
>
> Acked-by: Neal Cardwell
Acked-by: Soheil Hassas Yeganeh
Nice! Thank you, Eric!
rate_threshold 10Mbit
> lpaa5:/tmp# ./netperf -H lpaa6 -t TCP_RR -l10 -- -q 50 -r 300,300 -o
> P99_LATENCY
> 99th Percentile Latency Microseconds
> 858
>
>
> Signed-off-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
Thank you, Eric!
ue.
>> ...
>>>
>>> Signed-off-by: Eric Dumazet
>>> Reported-by: liujian
>>> ---
>>> net/ipv4/tcp_output.c | 19 ---
>>> 1 file changed, 12 insertions(+), 7 deletions(-)
>>
>> Acked-by: Neal Cardwell
> Acked-by: Yuchung Cheng
Acked-by: Soheil Hassas Yeganeh
Very nice! Thank you, Eric!
igned-off-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
Nice catch!
t;
>> Even if RFC 7323 does not request it, this is consistent to what linux
>> did in the past, when TS values were based on jiffies.
>>
>> Fixes: 385e20706fac ("tcp: use tp->tcp_mstamp in output path")
>> Signed-off-by: Eric Dumazet
>> Cc: Soh
only deals with CHECKSUM_PARTIAL
> tcp: remove dead code from tcp_set_skb_tso_segs()
> tcp: remove dead code after CHECKSUM_PARTIAL adoption
Acked-by: Soheil Hassas Yeganeh
Very nice patch-series! Thank you, Eric!
>include/net/sock.h| 10
; 631
> > 517
> >
> > After patch :
> > # for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done
> >1733 (ss -temoi shows cwnd is around 386 )
> >1778
> >1746
> >1781
> >1718
> >
> > Fixes: 0f8782ea14
ing control messages")
> Signed-off-by: Douglas Caetano dos Santos
Acked-by: Soheil Hassas Yeganeh
> ---
> net/packet/af_packet.c | 14 +++---
> 1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index
From: Soheil Hassas Yeganeh
tcp_ack() can call tcp_fragment() which may dededuct the
value tp->fackets_out when MSS changes. When prior_fackets
is larger than tp->fackets_out, tcp_clean_rtx_queue() can
invoke tcp_update_reordering() with negative values. This
results in absurd tp->
1
> 17 83 0 0
> 4 0 0 259829168 46024 27105840016 0 1688472 197158 1
> 17 82 0 0
> 3 0 0 259830224 46024 271040800 0 0 1692450 197212 0
> 18 82 0 0
>
> As expected, number of interrupts per second is very different.
&g
From: Soheil Hassas Yeganeh
Commit bafbb9c73241 ("tcp: eliminate negative reordering
in tcp_clean_rtx_queue") fixes an issue for negative
reordering metrics.
To be resilient to such errors, warn and return
when a negative metric is passed to tcp_update_reordering().
Signed-off-
p only when necessary.
>
> Signed-off-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
> ---
> net/ipv4/tcp_ipv4.c | 1 +
> net/ipv4/tcp_output.c | 21 +++--
> net/ipv4/tcp_recovery.c | 1 -
> net/ipv4/tcp_timer.c| 3 ++-
> 4 files changed, 14 insert
n the future to have 1ms TCP TS clock,
> regardless of HZ value, we want to cleanup things.
>
> tcp_jiffies32 is the truncated jiffies value,
> which will be used only in places where we want a 'host'
> timestamp.
>
> Signed-off-by: Eric Dumazet
Acked-by:
On Tue, May 16, 2017 at 5:00 PM, Eric Dumazet wrote:
> Use our own macro instead of abusing tcp_time_stamp
>
> Signed-off-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
> ---
> net/dccp/ccids/ccid2.c | 8
> net/dccp/ccids/ccid2.h | 2 +-
> 2 files change
On Tue, May 16, 2017 at 5:00 PM, Eric Dumazet wrote:
> Use tcp_jiffies32 instead of tcp_time_stamp to feed
> tp->lsndtime.
>
> tcp_time_stamp will soon be a litle bit more expensive
> than simply reading 'jiffies'.
>
> Signed-off-by: Eric Dumazet
On Tue, May 16, 2017 at 5:00 PM, Eric Dumazet wrote:
> Use tcp_jiffies32 instead of tcp_time_stamp, since
> tcp_time_stamp will soon be only used for TCP TS option.
>
> Signed-off-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
> ---
> net/ipv4/tcp_bbr.c | 12 ++--
On Tue, May 16, 2017 at 5:00 PM, Eric Dumazet wrote:
> Use tcp_jiffies32 instead of tcp_time_stamp to feed
> tp->snd_cwnd_stamp.
>
> tcp_time_stamp will soon be a litle bit more expensive
> than simply reading 'jiffies'.
>
> Signed-off-by: Eric Dumazet
On Tue, May 16, 2017 at 5:00 PM, Eric Dumazet wrote:
> Use tcp_jiffies32 instead of tcp_time_stamp, since
> tcp_time_stamp will soon be only used for TCP TS option.
>
> Signed-off-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
> ---
> net/ipv4/tcp_bic.c | 6
On Tue, May 16, 2017 at 5:00 PM, Eric Dumazet wrote:
> Use tcp_jiffies32 instead of tcp_time_stamp, since
> tcp_time_stamp will soon be only used for TCP TS option.
>
> Signed-off-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
> ---
> include/net/tcp.h|
On Tue, May 16, 2017 at 5:00 PM, Eric Dumazet wrote:
> tcp_time_stamp will no longer be tied to jiffies.
>
> Signed-off-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
> ---
> net/ipv4/tcp.c| 2 +-
> net/ipv4/tcp_output.c | 2 +-
> 2 files changed, 2 inser
On Tue, May 16, 2017 at 5:00 PM, Eric Dumazet wrote:
> Use tcp_jiffies32 instead of tcp_time_stamp, since
> tcp_time_stamp will soon be only used for TCP TS option.
>
> Signed-off-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
> ---
> net/ipv4/tcp_output.c | 6
On Tue, May 16, 2017 at 5:00 PM, Eric Dumazet wrote:
> tcp_time_stamp will become slightly more expensive soon,
> cache its value.
>
> Signed-off-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
> ---
> net/ipv4/tcp_lp.c | 7 ---
> 1 file changed, 4 insertions(+), 3
On Tue, May 16, 2017 at 5:00 PM, Eric Dumazet wrote:
> After this patch, all uses of tcp_time_stamp will require
> a change when we introduce 1 ms and/or 1 us TCP TS option.
>
> Signed-off-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
> ---
> net/ipv4/tcp.c
On Tue, May 16, 2017 at 5:00 PM, Eric Dumazet wrote:
> This CC does not need 1 ms tcp_time_stamp and can use
> the jiffy based 'timestamp'.
>
> Signed-off-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
> ---
> net/ipv4/tcp_westwood.c | 6 +++---
> 1 f
On Tue, May 16, 2017 at 5:00 PM, Eric Dumazet wrote:
> This place wants to use tcp_jiffies32, this is good enough.
>
> Signed-off-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
> ---
> net/ipv4/tcp_input.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
&g
/slides-97-tcpm-tcp-options-for-low-latency-00.pdf
>
> Signed-off-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
> ---
> include/linux/skbuff.h | 62 +-
> include/linux/tcp.h | 22 -
> include/net/tcp.h
On Tue, May 16, 2017 at 8:44 AM, Miroslav Lichvar wrote:
> Add SOF_TIMESTAMPING_OPT_PKTINFO option to request a new control message
> for incoming packets with hardware timestamps. It contains the index of
> the real interface which received the packet and the length of the
> packet at layer 2.
>
rto from jiffies to usec, compute a time difference
> in usec, then convert the delta to HZ units.
>
> Fixes: 9a568de4818d ("tcp: switch TCP TS option (RFC 7323) to 1ms clock")
> Signed-off-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
Thank you for the quick fix, Eric!
ption (RFC 7323) to 1ms clock")
> Signed-off-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
Thank you for the fix, Eric!
4818d ("tcp: switch TCP TS option (RFC 7323) to 1ms clock")
> Signed-off-by: Eric Dumazet
> Signed-off-by: Yuchung Cheng
Acked-by: Soheil Hassas Yeganeh
Nice!
On Thu, Jun 1, 2017 at 10:00 AM, Cyril Hrubis wrote:
> I've bisected the problem to this commit:
>
> commit f5f99309fa7481f59a500f0d08f3379cd6424c1f (HEAD, refs/bisect/bad)
> Author: Soheil Hassas Yeganeh
> Date: Thu Nov 3 18:24:27 2016 -0400
>
>
g
patch to see if it fixes your issue?
>From 3ec438460425d127741b20f03f78644c9e441e8c Mon Sep 17 00:00:00 2001
From: Soheil Hassas Yeganeh
Date: Thu, 1 Jun 2017 10:34:09 -0400
Subject: [PATCH net] sock: reset sk_err when the error queue is empty
Before f5f99309fa74 (sock: do
On Thu, Jun 1, 2017 at 11:10 AM, Cyril Hrubis wrote:
>> Thank you for the confirmation. Could you please try the following
>> patch to see if it fixes your issue?
>
> Does not seem to help, I still got the same bussy loop.
Thank you for trying the patch. Unfortunately, I can't reproduce on my
mac
On Thu, Jun 1, 2017 at 11:36 AM, Cyril Hrubis wrote:
> It seems to repeatedly produce (until I plug the cable back):
>
> ee_errno = 113 ee_origin = 2 ee_type = 3 ee_code = 1 ee_info = 0 ee_data = 0
>
> So we get EHOSTUNREACH on SO_EE_ORIGIN_ICMP.
Thank you very much! I have a wild guess that, whe
1 - 100 of 210 matches
Mail list logo