Removing two 4 bytes holes allows to use kmalloc-32
kmem cache instead of kmalloc-64 on 64bit kernels.
Signed-off-by: Eric Dumazet
---
include/linux/igmp.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/igmp.h b/include/linux/igmp.h
index
ys_setsockopt net/socket.c:2086 [inline]
[<ac198ef0>] __x64_sys_setsockopt+0x26/0x30 net/socket.c:2086
[<0a770437>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301
[<d3adb93b>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 9c8bb163a
On Wed, May 22, 2019 at 6:19 PM kbuild test robot wrote:
>
> tree: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master
> head: 3580d04aa674383c42de7b635d28e52a1e5bc72c
> commit: 3580d04aa674383c42de7b635d28e52a1e5bc72c [11/11] ipv4/igmp: fix
> another memory leak in igmpv3_de
ip_sf_list_clear_all() needs to be defined even if !CONFIG_IP_MULTICAST
Fixes: 3580d04aa674 ("ipv4/igmp: fix another memory leak in
igmpv3_del_delrec()")
Signed-off-by: Eric Dumazet
Reported-by: kbuild test robot
---
net/ipv4/igmp.c | 22 +++---
1 file changed, 11
(us)
>
> SO_TXTIME ipv6 clock monolithic
> payload:a delay:10049 expected:1 (us)
>
> SO_TXTIME ipv4 clock monolithic
> payload:a delay:10105 expected:1 (us)
Thanks for the test Willem.
Acked-by: Eric Dumazet
And pass an extra parameter, since we will soon
dynamically allocate fqdir structures.
Signed-off-by: Eric Dumazet
---
include/net/inet_frag.h | 3 ++-
net/ieee802154/6lowpan/reassembly.c | 3 +--
net/ipv4/ip_fragment.c | 3 +--
net/ipv6/netfilter
Signed-off-by: Eric Dumazet
---
include/net/inet_frag.h | 2 +-
net/ieee802154/6lowpan/reassembly.c | 4 ++--
net/ipv4/inet_fragment.c| 4 ++--
net/ipv4/ip_fragment.c | 4 ++--
net/ipv6/netfilter/nf_conntrack_reasm.c | 4 ++--
net/ipv6
backport it once soaked a bit.
Eric Dumazet (11):
inet: rename netns_frags to fqdir
net: rename inet_frags_exit_net() to fqdir_exit()
net: rename struct fqdir fields
ipv4: no longer reference init_net in ip4_frags_ns_ctl_table[]
ipv6: no longer reference init_net in ip6_frags_ns_ctl_table
(struct net *)->ipv4.fqdir will soon be a pointer, so make
sure ip4_frags_ns_ctl_table[] does not reference init_net.
ip4_frags_ns_ctl_register() can perform the needed initialization
for all netns.
Signed-off-by: Eric Dumazet
---
net/ipv4/ip_fragment.c | 18 ++
1 file chan
(struct net *)->ipv6.fqdir will soon be a pointer, so make
sure ip6_frags_ns_ctl_table[] does not reference init_net.
ip6_frags_ns_ctl_register() can perform the needed initialization
for all netns.
Signed-off-by: Eric Dumazet
---
net/ipv6/reassembly.c | 15 +--
1 file changed
Rename the @frags fields from structs netns_ipv4, netns_ipv6,
netns_nf_frag and netns_ieee802154_lowpan to @fqdir
Signed-off-by: Eric Dumazet
---
include/net/netns/ieee802154_6lowpan.h | 2 +-
include/net/netns/ipv4.h| 2 +-
include/net/netns/ipv6.h| 4
(struct net *)->ieee802154_lowpan.fqdir will soon be a pointer, so make
sure lowpan_frags_ns_ctl_table[] does not reference init_net.
lowpan_frags_ns_sysctl_register() can perform the needed initialization
for all netns.
Signed-off-by: Eric Dumazet
---
net/ieee802154/6lowpan/reassembly.c |
1) struct netns_frags is renamed to struct fqdir
This structure is really holding many frag queues in a hash table.
2) (struct inet_frag_queue)->net field is renamed to fqdir
since net is generally associated to a 'struct net' pointer
in networking stack.
Signed-off-by:
fqdir will soon be dynamically allocated.
We need to reach the struct net pointer from fqdir,
so add it, and replace the various container_of() constructs
by direct access to the new field.
Signed-off-by: Eric Dumazet
---
include/net/inet_frag.h | 5 -
net/ieee802154
(struct net *)->nf_frag.fqdir will soon be a pointer, so make
sure nf_ct_frag6_sysctl_table[] does not reference init_net.
nf_ct_frag6_sysctl_register() can perform the needed initialization
for all netns.
Signed-off-by: Eric Dumazet
---
net/ipv6/netfilter/nf_conntrack_reasm.c |
Following patch will add rcu grace period before fqdir
rhashtable destruction, so we need to dynamically allocate
fqdir structures to not force expensive synchronize_rcu() calls
in netns dismantle path.
Signed-off-by: Eric Dumazet
---
include/net/inet_frag.h | 17
fb fb fb fb fb
^
8880a6497b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
8880a6497c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Fixes: 648700f76b03 ("inet: frags: use rhashtables for reassembly units")
Signed-off-by: Eric Dumazet
Reported-by: syzbot
---
include/net/inet_fra
: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet
Reported-by: syzbot
---
include/net/inet_frag.h | 3 +++
net/ipv4/inet_fragment.c | 20 ++--
2 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/include/net/inet_frag.h b/include/net/inet
smantle
at module removal.
Eric Dumazet (3):
inet: frags: uninline fqdir_init()
inet: frags: call inet_frags_fini() after unregister_pernet_subsys()
inet: frags: fix use-after-free read in inet_frag_destroy_rcu
include/net/inet_frag.h | 23 +++--
net/ieee802154/6lowpan/re
the following patch.
Fixes: d4ad4d22e7ac ("inet: frags: use kmem_cache for inet_frag_queue")
Signed-off-by: Eric Dumazet
---
net/ieee802154/6lowpan/reassembly.c | 2 +-
net/ipv6/reassembly.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/ieee8021
fqdir_init() is not fast path and is getting bigger.
Signed-off-by: Eric Dumazet
---
include/net/inet_frag.h | 20 +---
net/ipv4/inet_fragment.c | 19 +++
2 files changed, 20 insertions(+), 19 deletions(-)
diff --git a/include/net/inet_frag.h b/include/net
:1972 [inline]
[<922d78d9>] __x64_sys_sendto+0x2a/0x30 net/socket.c:1972
[<cec820c1>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301
[<0c32554f>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 1da177e4c3f4 ("
On Mon, May 27, 2019 at 11:34 PM Herbert Xu wrote:
>
> Hi Eric:
>
> Eric Dumazet wrote:
> >
> > +void fqdir_exit(struct fqdir *fqdir)
> > +{
> > + fqdir->high_thresh = 0; /* prevent creation of new frags */
> > +
> > +
On 5/27/19 8:49 PM, brakmo wrote:
> Create new macro BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY() to be used by
> __cgroup_bpf_run_filter_skb for EGRESS BPF progs so BPF programs can
> request cwr for TCP packets.
>
...
> +#define BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY(array, ctx, func)
On 5/28/19 11:28 AM, Sergej Benilov wrote:
> Since commit 605ad7f184b60cfaacbc038aa6c55ee68dee3c89 "tcp: refine TSO
> autosizing",
> the TSQ limit is computed as the smaller of
> sysctl_tcp_limit_output_bytes and max(2 * skb->truesize, sk->sk_pacing_rate
> >> 10).
> For low pacing rates, this
On 5/28/19 11:34 AM, Sergej Benilov wrote:
> Since commit 605ad7f184b60cfaacbc038aa6c55ee68dee3c89 "tcp: refine TSO
> autosizing",
> the TSQ limit is computed as the smaller of
> sysctl_tcp_limit_output_bytes and max(2 * skb->truesize, sk->sk_pacing_rate
> >> 10).
> For low pacing rates, this
On 5/28/19 11:54 AM, Lawrence Brakmo wrote:
> On 5/28/19, 6:43 AM, "netdev-ow...@vger.kernel.org on behalf of Eric Dumazet"
> wrote:
>
> Why are you using preempt_enable_no_resched() here ?
>
> Because that is what __BPF_PROG_RUN
On 5/28/19 1:50 PM, Davide Caratti wrote:
> Like it has been done in commit 2ecba2d1e45b ("net: sched: act_csum: Fix
> csum calc for tagged packets"), also 'pedit' needs to adjust the network
> offset when multiple tags are present in the packets: otherwise wrong IP
> headers (but good checksums
;
> Therefore this patch removes the unnecessary smp_store_release call
> as well as the corresponding READ_ONCE on the read-side in order to
> not confuse future readers of this code. Comments have been added
> in their places.
>
> Signed-off-by: Herbert Xu
>
SGTM, thanks.
Reviewed-by: Eric Dumazet
David, this targets net-next tree :)
x1b3/0x2f0 fs/read_write.c:1015
do_writev+0x15b/0x330 fs/read_write.c:1058
Fixes: a50e233c50db ("net-gro: restore frag0 optimization")
Signed-off-by: Eric Dumazet
Reported-by: syzbot
---
net/core/dev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/dev.c
On 5/29/19 3:15 PM, Sebastian Andrzej Siewior wrote:
> netdev_alloc_frag() can be used from any context and is used by NAPI
> and non-NAPI drivers. Non-NAPI drivers use it in interrupt context
> and NAPI drivers use it during initial allocation (->ndo_open() or
> ->ndo_change_mtu()). Some NAPI d
On 5/31/19 1:29 AM, Young Xiao wrote:
> There is a possible null pointer deference bugs in neigh_fill_info(),
> which is similar to the bug which was fixed in commit 6adc5fd6a142
> ("net/neighbour: fix crash at dumping device-agnostic proxy entries").
>
> Signed-off-by: Young Xiao <92siuy...@gm
On 5/31/19 1:34 AM, Young Xiao wrote:
> There is a possible null pointer deference bug in vxlan_fdb_info(),
> which is similar to the bug which was fixed in commit 6adc5fd6a142
> ("net/neighbour: fix crash at dumping device-agnostic proxy entries").
>
> Signed-off-by: Young Xiao <92siuy...@gmai
On 5/31/19 7:45 AM, Herbert Xu wrote:
> On Fri, May 31, 2019 at 10:24:08AM +0200, Dmitry Vyukov wrote:
>>
>> OK, let's call it barrier. But we need more than a barrier here then.
>
> READ_ONCE/WRITE_ONCE is not some magical dust that you sprinkle
> around in your code to make it work without lo
On Fri, May 31, 2019 at 9:29 AM Andrea Parri
wrote:
>
> On Fri, May 31, 2019 at 08:45:47AM -0700, Eric Dumazet wrote:
> > On 5/31/19 7:45 AM, Herbert Xu wrote:
>
> > > In this case the code doesn't need them because an implicit
> > > barrier() (which i
On Fri, May 31, 2019 at 10:11 AM Paul E. McKenney wrote:
>
> On Fri, May 31, 2019 at 08:45:47AM -0700, Eric Dumazet wrote:
> >
> >
> > On 5/31/19 7:45 AM, Herbert Xu wrote:
> > > On Fri, May 31, 2019 at 10:24:08AM +0200, Dmitry Vyukov wrote:
> > >>
In general, this_cpu_read(*X) is faster than *this_cpu_ptr(X)
Also remove the inline attibute, totally useless.
Signed-off-by: Eric Dumazet
Cc: Kefeng Wang
---
net/ipv6/icmp.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index
On 5/3/19 8:01 AM, Paolo Abeni wrote:
> This avoids an indirect call per RX IPv6/IPv4 packet.
> Note that we don't want to use the indirect calls helper for taps.
>
> Signed-off-by: Paolo Abeni
> ---
> net/core/dev.c | 6 --
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --gi
On 5/31/19 2:38 PM, David Ahern wrote:
> On 5/31/19 3:29 PM, David Miller wrote:
>> David, can you add some supplementary information to your cover letter
>> et al. which seems to be part of what Alexei is asking for and seems
>> quite reasonable?
>>
>
> It is not clear to me what more is want
this_cpu_read(*X) is faster than *this_cpu_ptr(X)
Signed-off-by: Eric Dumazet
---
net/ipv6/route.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index
fada5a13bcb2a286bb20a350c1873b1b16dc866a
this_cpu_read(*X) is faster than *this_cpu_ptr(X)
Signed-off-by: Eric Dumazet
---
net/ipv4/icmp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index
f3a5893b1e8619716f19f85dc77f2e1e12284b4d..49d6b037b113e85877f8e689e690f1c0d3427386
this_cpu_read(*X) is slightly faster than *this_cpu_ptr(X)
Signed-off-by: Eric Dumazet
---
net/ipv4/tcp_ipv4.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index
af81e4a6a8d8eac9aad551a129384ff6b1bf2f6c
On 5/31/19 7:29 PM, David Ahern wrote:
> On 5/31/19 7:04 PM, Eric Dumazet wrote:
>>
>> I have a bunch (about 15 ) of syzbot reports, probably caused to your latest
>> patch series.
>>
>> Do we want to stabilize first, or do you expect this new pa
On 5/31/19 7:34 PM, Eric Dumazet wrote:
>
>
> On 5/31/19 7:29 PM, David Ahern wrote:
>> On 5/31/19 7:04 PM, Eric Dumazet wrote:
>>>
>>> I have a bunch (about 15 ) of syzbot reports, probably caused to your
>>> latest patch series.
>>>
>&
kbuff fraglist splitter")
Fixes: c8b17be0b7a4 ("net: ipv4: add skbuff fraglist splitter")
Signed-off-by: Eric Dumazet
Cc: Pablo Neira Ayuso
---
include/net/ip.h | 1 -
include/net/ipv6.h| 1 -
net/ipv4/ip_output.c | 5 ++---
net/ipv6/ip6_output.c | 5 ++---
net/ipv6/netfilter.
stead we can wrap the list_func invocation.
>
> v1 -> v2:
> - use the correct fix tag
>
> Fixes: f5737cbadb7d ("net: use indirect calls helpers for ptype hook")
> Suggested-by: Eric Dumazet
> Signed-off-by: Paolo Abeni
> Acked-by: Edward Cree
> ---
Reviewed-by: Eric Dumazet
Signed-off-by: Eric Dumazet
---
net/ipv6/tcp_ipv6.c | 13 ++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index
beaf284563015ef0677c39fc056e6ecde3518920..07684f1e02f773a9d3e22a86ae4e7b853cc0b73e
100644
--- a/net/ipv6/tcp_ip
On 6/4/19 12:29 PM, Eric Dumazet wrote:
> This extends commit 22b6722bfa59 ("ipv6: Add sysctl for per
> namespace flow label reflection"), for some TCP RST packets.
>
> When RST packets are sent because no socket could be found,
> it makes sense to use flowlabel_reflec
On 6/5/19 3:49 AM, Zhiqiang Liu wrote:
> small cleanup: "struct request_sock_queue *queue" parameter of
> reqsk_queue_unlink
> func is never used in the func, so we can remove it.
>
> Signed-off-by: Zhiqiang Liu
> ---
SGTM
Reviewed-by: Eric Dumazet
First patch allows to reflect incoming IPv6 flowlabel
on RST packets sent when no socket could handle the packet.
Second patch makes sure we send the same flowlabel
for RST or ACK packets on behalf of TIME_WAIT sockets.
Eric Dumazet (2):
ipv6: tcp: enable flowlabel reflection in some RST
n order to provide full control of this new feature,
flowlabel_reflect becomes a bitmask.
Signed-off-by: Eric Dumazet
---
Documentation/networking/ip-sysctl.txt | 20 +++-
net/ipv6/af_inet6.c| 2 +-
net/ipv6/sysctl_net_ipv6.c | 3 +++
net/ipv6/
flowlabel.
Signed-off-by: Eric Dumazet
Cc: Florent Fourcot
---
net/ipv6/tcp_ipv6.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index
4ccb06ea8ce32d614fc0848e1c4e74b441fa1f2c..f4e609a48e68442693936050c2336ca1e80e1710
100644
--- a/net/ipv6/
RCU 101 : Before taking a refcount, make sure the object is not already
scheduled for deletion.
Fixes: 18367681a10b ("ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.")
Signed-off-by: Eric Dumazet
Cc: Willem de Bruijn
---
net/ipv6/ip6_flowlabel.c | 4 ++--
1 fi
On Thu, Jun 6, 2019 at 2:22 PM Eric Dumazet wrote:
>
> RCU 101 : Before taking a refcount, make sure the object is not already
> scheduled for deletion.
>
I will send a V2, there is a second atomic_inc() which needs to be
changed in ipv6_flowlabel_opt()
Before taking a refcount, make sure the object is not already
scheduled for deletion.
Same fix is needed in ipv6_flowlabel_opt()
Fixes: 18367681a10b ("ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.")
Signed-off-by: Eric Dumazet
Cc: Willem de Bruijn
---
net/ipv6/ip6_fl
2d2e..ca17b50c 100644
>> --- a/drivers/net/ethernet/sis/sis900.c
>> +++ b/drivers/net/ethernet/sis/sis900.c
>> @@ -1604,6 +1604,7 @@ sis900_start_xmit(struct sk_buff *skb, struct
>> net_device *net_dev)
>> unsigned int index_cur_tx, index_dirty_tx;
>> unsigned int cou
mit 0f85feae6b71 ("tcp: fix
more NULL deref after prequeue changes"), I should have known better.
Fixes: 323a53c41292 ("ipv6: tcp: enable flowlabel reflection in some RST
packets")
Signed-off-by: Eric Dumazet
Reported-by: syzbot
---
net/ipv6/tcp_ipv6.c | 2 +-
1 file changed, 1 i
txhash into the TIME_WAIT socket.
After this patch, ACK or RST packets sent on behalf of
a TIME_WAIT socket have the flowlabel that was previously
used by the flow.
Signed-off-by: Eric Dumazet
---
include/net/inet_timewait_sock.h | 1 +
net/ipv4/tcp_minisocks.c | 1 +
net/
lso provides a socket pointer to sock_net_uid() calls.
Fixes: 00483690552c ("tcp: Add mark for TIMEWAIT sockets")
Signed-off-by: Eric Dumazet
Cc: Jon Maxwell
---
net/ipv4/tcp_ipv4.c | 6 --
net/ipv6/tcp_ipv6.c | 1 +
2 files changed, 5 insertions(+), 2 deletions(-)
diff --g
On Mon, Jun 10, 2019 at 3:04 PM David Miller wrote:
>
> From: Eric Dumazet
> Date: Mon, 10 Jun 2019 14:45:43 -0700
>
> > Using sk_to_full_sk() should get back to the listener socket.
>
> net/ipv6/tcp_ipv6.c: In function ‘tcp_v6_send_response’:
> net/ipv6/tcp_ipv6
gt; On Tue, Jun 11, 2019 at 7:45 AM Eric Dumazet wrote:
> >
> > TCP can send ACK packets on behalf of SYN_RECV sockets.
> >
> > tcp_v4_send_ack() and tcp_v6_send_response() incorrectly
> > dereference sk->sk_mark for non TIME_WAIT sockets.
> >
> >
t; This fix is not perfect, it only unsets the flag but does not set it back
> because we have to save the information somewhere in the qdisc if we
> really want that.
>
> Fixes: 4b549a2ef4be ("fq_codel: Fair Queue Codel AQM")
> Cc: Eric Dumazet
> Signed-off-by: Cong
On 7/16/19 2:26 AM, Petar Penkov wrote:
> From: Petar Penkov
>
> This helper function allows BPF programs to try to generate SYN
> cookies, given a reference to a listener socket. The function works
> from XDP and with an skb context since bpf_skc_lookup_tcp can lookup a
> socket in both cases
On 7/17/19 10:03 AM, Chuhong Yuan wrote:
> Variable allocated by kvmalloc should not be freed by kfree.
> Because it may be allocated by vmalloc.
> So replace kfree with kvfree here.
>
> Signed-off-by: Chuhong Yuan
> ---
Please add corresponding Fixes: tag, thanks !
> drivers/net/ethernet/m
we setup in any scenario, otherwise our packets steering
> > policy could not be enforced.
> ...
>
> Eric I think your feedback was addressed, please review to confirm.
Yes, this seems good to me, thanks.
Reviewed-by: Eric Dumazet
On 7/18/19 9:49 AM, Jacob Wen wrote:
> Hi,
>
> inet_gro_receive verifies IP csum but a NIC already did so and set
> CHECKSUM_UNNECESSARY.
>
>
> https://github.com/torvalds/linux/blob/v5.2/net/ipv4/af_inet.c#L1432-L1433
>
> if (unlikely(ip_fast_csum((u8 *)iph, 5)))
>
> goto out_unlo
ix is to add a new parameter to tcp_set_congestion_control(),
so that the ns_capable() call is only performed under the right
context.
Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control")
Signed-off-by: Eric Dumazet
Cc: Lawrence Brakmo
Reported-by: Neal Cardwell
---
include/net
nt() should apply sane memory limits")
Signed-off-by: Eric Dumazet
Reported-by: Andrew Prout
Tested-by: Andrew Prout
Tested-by: Jonathan Lemon
Tested-by: Michal Kubecek
Acked-by: Neal Cardwell
Acked-by: Yuchung Cheng
Acked-by: Christoph Paasch
Cc: Jonathan Looney
---
include/ne
On Tue, Jul 23, 2019 at 2:20 AM Petar Penkov wrote:
>
> From: Petar Penkov
>
> This patch series introduces a BPF helper function that allows generating SYN
> cookies from BPF. Currently, this helper is enabled at both the TC hook and
> the
> XDP hook.
Please provide performance numbers ?
We
Use BPF_REG_1 for source and destination of gso_segs read,
to exercise "bpf: fix access to skb_shared_info->gso_segs" fix.
Signed-off-by: Eric Dumazet
Suggested-by: Stanislav Fomichev
---
tools/testing/selftests/bpf/verifier/ctx_skb.c | 11 +++
1 file changed, 11 insert
w BPF programs access skb_shared_info->gso_segs
field")
Signed-off-by: Eric Dumazet
Reported-by: syzbot
---
net/core/filter.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index
4e2a79b2fd77f36ba2
First patch changes the kernel, second patch
adds a new test.
Note that other patches might be needed to take
care of similar issues in sock_ops_convert_ctx_access()
and SOCK_OPS_GET_FIELD()
Eric Dumazet (2):
bpf: fix access to skb_shared_info->gso_segs
selftests/bpf: add another gso_s
On 7/23/19 7:28 PM, Ioana Radulescu wrote:
> Using Rx skb bulking for all frames may negatively impact the
> performance in some TCP termination scenarios, as it effectively
> bypasses GRO.
>
> - list_add_tail(&skb->list, ch->rx_list);
> + if (frame_is_tcp(fd, fas))
> + na
On Sat, Jul 27, 2019 at 4:23 AM Josh Hunt wrote:
>
> The current implementation of TCP MTU probing can considerably
> underestimate the MTU on lossy connections allowing the MSS to get down to
> 48. We have found that in almost all of these cases on our networks these
> paths can handle much large
On Sun, Jul 28, 2019 at 1:21 AM Josh Hunt wrote:
>
> On 7/27/19 12:05 AM, Eric Dumazet wrote:
> > On Sat, Jul 27, 2019 at 4:23 AM Josh Hunt wrote:
> >>
> >> The current implementation of TCP MTU probing can considerably
> >> underestimate the MTU on lossy c
On 8/8/19 1:52 AM, Josh Hunt wrote:
> TCP_BASE_MSS is used as the default initial MSS value when MTU probing is
> enabled. Update the comment to reflect this.
>
> Suggested-by: Neal Cardwell
> Signed-off-by: Josh Hunt
> ---
Signed-off-by: Eric Dumazet
; The new sysctl will still default to TCP_MIN_SND_MSS (48), but gives
> administrators the ability to control the floor of MSS probing.
>
> Signed-off-by: Josh Hunt
Signed-off-by: Eric Dumazet
ue does not provide much value either way.
>
> Signed-off-by: Daniel Borkmann
> Cc: Eric Dumazet
> Cc: Alexei Starovoitov
> Cc: Willem de Bruijn
> Cc: Martynas Pumputis
> ---
> include/net/net_namespace.h | 1 -
> include/uapi/linux/bpf.h| 4 ++--
> net/core/sock_
On Thu, Aug 8, 2019 at 1:09 PM Daniel Borkmann wrote:
>
> On 8/8/19 12:45 PM, Eric Dumazet wrote:
> > On Thu, Aug 8, 2019 at 11:50 AM Daniel Borkmann
> > wrote:
> >
> >> Socket cookie consumers must assume the value as opqaue in any case.
> >> The
On Fri, Aug 9, 2019 at 10:04 AM Holger Hoffstätte
wrote:
>
> On 8/8/19 10:08 PM, Heiner Kallweit wrote:
> (..snip..)
> >>>
> >>> I was about to ask exactly that, whether you have TSO enabled. I don't
> >>> know what
> >>> can trigger the HW issue, it was just confirmed by Realtek that this chip
as Yeganeh
Signed-off-by: Eric Dumazet
---
net/ipv4/tcp.c | 11 ++-
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index
a0a66321c0ee99918b2080219dbaefcf3c398e13..f8fa1686f7f3e64f5d4ea8163e7f87538cc0d672
100644
--- a/net/ipv4/tcp.c
+++ b/net/
On 8/12/19 9:32 AM, Ying Xue wrote:
> syzbot found the following memory leak:
>
> [ 68.602482][ T7130] kmemleak: 2 new suspected memory leaks (see
> /sys/kernel/debug/kmemleak)
> BUG: memory leak
> unreferenced object 0x88810df83c00 (size 512):
> comm "softirq", pid 0, jiffies 42949423
xe7
Fixes: ab84be7e54fc ("net: Initial nexthop code")
Signed-off-by: Eric Dumazet
Reported-by: syzbot
Cc: David Ahern
---
net/ipv4/nexthop.c | 12 ++--
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
index
5fe5a3981d4316ad8d9d
/0x70 net/socket.c:2305
do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
entry_SYSCALL_64_after_hwframe+0x63/0xe7
RIP: 0033:0x440209
Fixes: b60620cf567b ("batman-adv: netlink: hardif query")
Signed-off-by: Eric Dumazet
Reported-by: syzbot
Cc: Marek Lindner
Cc: Simon Wunderlich
C
On 8/12/19 7:51 PM, Ioana Ciocoi Radulescu wrote:
>> -Original Message-
>> From: Edward Cree
>> Sent: Friday, August 9, 2019 8:32 PM
>> To: Ioana Ciocoi Radulescu
>> Cc: David Miller ; netdev ;
>> Eric Dumazet ; linux-net-driv...@solarflare.com
&g
e with
> NL_VALIDATE_STRICT for the validate argument very much like
> nlmsg_parse_deprecated is for NL_VALIDATE_LIBERAL.
>
> Fixes: 3de6440354465 ("netlink: re-add parse/validate functions in strict
> mode")
> Reported-by: Eric Dumazet
> Reported-by: syzbot
>
From: Eric Dumazet
This reverts commit c67f5db82027ba6d2ea4ac9176bc45996a03ae6a.
While using page fragments instead of a kmalloc backed skb->head might give
a small performance improvement in some cases, there is a huge risk of
memory use under estimation.
GOOD_COPY_LEN is 128 bytes. T
On 1/12/21 6:25 PM, Paolo Abeni wrote:
> Instead of re-implementing most of inet_shutdown, re-use
> such helper, and implement the MPTCP-specific bits at the
> 'proto' level.
>
> The msk-level disconnect() can now be invoked, lets provide a
> suitable implementation.
>
> As a side effect, this
On 1/13/21 11:21 AM, Eric Dumazet wrote:
>
>
> On 1/12/21 6:25 PM, Paolo Abeni wrote:
>> Instead of re-implementing most of inet_shutdown, re-use
>> such helper, and implement the MPTCP-specific bits at the
>> 'proto' level.
>>
>> The msk-leve
From: Eric Dumazet
Both virtio net and napi_get_frags() allocate skbs
with a very small skb->head
While using page fragments instead of a kmalloc backed skb->head might give
a small performance improvement in some cases, there is a huge risk of
under estimating memory usage.
Fo
From: Eric Dumazet
iproute2 probably never goes beyond 8 for the cell exponent,
but stick to the max shift exponent for signed 32bit.
UBSAN reported:
UBSAN: shift-out-of-bounds in net/sched/sch_api.c:389:22
shift exponent 130 is too large for 32-bit type 'int'
CPU: 1 PID: 8450
From: Eric Dumazet
syzbot report reminded us that very big ewma_log were supported in the past,
even if they made litle sense.
tc qdisc replace dev xxx root est 1sec 131072sec ...
While fixing the bug, also add boundary checks for ewma_log, in line
with range supported by iproute2.
UBSAN
From: Eric Dumazet
tc_index being 16bit wide, we need to check that TCA_TCINDEX_SHIFT
attribute is not silly.
UBSAN: shift-out-of-bounds in net/sched/cls_tcindex.c:260:29
shift exponent 255 is too large for 32-bit type 'int'
CPU: 0 PID: 8516 Comm: syz-executor228 Not tainted 5.10.0-sy
On 1/14/21 4:37 PM, Paolo Abeni wrote:
> tcp_disconnect() expects the caller acquires the sock lock,
> but mptcp_disconnect() is not doing that. Add the missing
> required lock.
>
> Reported-by: Eric Dumazet
> Fixes: 76e2a55d1625 ("mptcp: better msk-level shutdown.&q
From: Eric Dumazet
Heiner Kallweit reported that some skbs were sent with
the following invalid GSO properties :
- gso_size > 0
- gso_type == 0
This was triggerring a WARN_ON_ONCE() in rtl8169_tso_csum_v2.
Juerg Haefliger was able to reproduce a similar issue using
a lan78xx NIC and a workl
On 1/21/21 2:47 PM, Xuan Zhuo wrote:
> This patch is used to construct skb based on page to save memory copy
> overhead.
>
> This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the
> network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to
> directly construct skb. If
On 12/8/20 10:45 AM, SeongJae Park wrote:
> From: SeongJae Park
>
> In 'fqdir_exit()', a work for destruction of the 'fqdir' is enqueued.
> The work function, 'fqdir_work_fn()', calls 'rcu_barrier()'. In case of
> intensive 'fqdir_exit()' (e.g., frequent 'unshare(CLONE_NEWNET)'
> systemcalls)
dwell ; Ingemar Johansson S
>> ; Yuchung Cheng
>> ; Soheil Hassas Yeganeh ; Eric
>> Dumazet
>> Subject: Re: [PATCH net] tcp: fix cwnd-limited bug for TSO deferral where we
>> send nothing
>>
>> On Tue, 8 Dec 2020 22:57:59 -0500 Neal Cardwell wrote:
>>> F
From: Eric Dumazet
We noticed that with a LOCKDEP enabled kernel,
allocating a hash table with 65536 buckets would
use more than 60ms.
htab_init_buckets() runs from process context,
it is safe to schedule to avoid latency spikes.
Fixes: c50eb518e262 ("bpf: Use separate lockdep class for
On 12/22/20 1:38 PM, weichenchen wrote:
> pneigh_enqueue() tries to obtain a random delay by mod
> NEIGH_VAR(p, PROXY_DELAY). However, NEIGH_VAR(p, PROXY_DELAY)
> migth be zero at that point because someone could write zero
> to /proc/sys/net/ipv4/neigh/[device]/proxy_delay after the
> callers c
1 - 100 of 7367 matches
Mail list logo