[PATCH net-next] ipv4/igmp: shrink struct ip_sf_list

2019-05-22 Thread Eric Dumazet
Removing two 4 bytes holes allows to use kmalloc-32 kmem cache instead of kmalloc-64 on 64bit kernels. Signed-off-by: Eric Dumazet --- include/linux/igmp.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/igmp.h b/include/linux/igmp.h index

[PATCH net] ipv4/igmp: fix another memory leak in igmpv3_del_delrec()

2019-05-22 Thread Eric Dumazet
ys_setsockopt net/socket.c:2086 [inline] [<ac198ef0>] __x64_sys_setsockopt+0x26/0x30 net/socket.c:2086 [<0a770437>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301 [<d3adb93b>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 9c8bb163a

Re: [net:master 11/11] net//ipv4/igmp.c:2157:2: error: implicit declaration of function 'ip_sf_list_clear_all'; did you mean 'ip_mc_filter_del'?

2019-05-22 Thread Eric Dumazet
On Wed, May 22, 2019 at 6:19 PM kbuild test robot wrote: > > tree: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master > head: 3580d04aa674383c42de7b635d28e52a1e5bc72c > commit: 3580d04aa674383c42de7b635d28e52a1e5bc72c [11/11] ipv4/igmp: fix > another memory leak in igmpv3_de

[PATCH net] ipv4/igmp: fix build error if !CONFIG_IP_MULTICAST

2019-05-22 Thread Eric Dumazet
ip_sf_list_clear_all() needs to be defined even if !CONFIG_IP_MULTICAST Fixes: 3580d04aa674 ("ipv4/igmp: fix another memory leak in igmpv3_del_delrec()") Signed-off-by: Eric Dumazet Reported-by: kbuild test robot --- net/ipv4/igmp.c | 22 +++--- 1 file changed, 11

Re: [PATCH net-next] selftests/net: SO_TXTIME with ETF and FQ

2019-05-23 Thread Eric Dumazet
(us) > > SO_TXTIME ipv6 clock monolithic > payload:a delay:10049 expected:1 (us) > > SO_TXTIME ipv4 clock monolithic > payload:a delay:10105 expected:1 (us) Thanks for the test Willem. Acked-by: Eric Dumazet

[PATCH net-next 08/11] net: rename inet_frags_init_net() to fdir_init()

2019-05-24 Thread Eric Dumazet
And pass an extra parameter, since we will soon dynamically allocate fqdir structures. Signed-off-by: Eric Dumazet --- include/net/inet_frag.h | 3 ++- net/ieee802154/6lowpan/reassembly.c | 3 +-- net/ipv4/ip_fragment.c | 3 +-- net/ipv6/netfilter

[PATCH net-next 02/11] net: rename inet_frags_exit_net() to fqdir_exit()

2019-05-24 Thread Eric Dumazet
Signed-off-by: Eric Dumazet --- include/net/inet_frag.h | 2 +- net/ieee802154/6lowpan/reassembly.c | 4 ++-- net/ipv4/inet_fragment.c| 4 ++-- net/ipv4/ip_fragment.c | 4 ++-- net/ipv6/netfilter/nf_conntrack_reasm.c | 4 ++-- net/ipv6

[PATCH net-next 00/11] inet: frags: avoid possible races at netns dismantle

2019-05-24 Thread Eric Dumazet
backport it once soaked a bit. Eric Dumazet (11): inet: rename netns_frags to fqdir net: rename inet_frags_exit_net() to fqdir_exit() net: rename struct fqdir fields ipv4: no longer reference init_net in ip4_frags_ns_ctl_table[] ipv6: no longer reference init_net in ip6_frags_ns_ctl_table

[PATCH net-next 04/11] ipv4: no longer reference init_net in ip4_frags_ns_ctl_table[]

2019-05-24 Thread Eric Dumazet
(struct net *)->ipv4.fqdir will soon be a pointer, so make sure ip4_frags_ns_ctl_table[] does not reference init_net. ip4_frags_ns_ctl_register() can perform the needed initialization for all netns. Signed-off-by: Eric Dumazet --- net/ipv4/ip_fragment.c | 18 ++ 1 file chan

[PATCH net-next 05/11] ipv6: no longer reference init_net in ip6_frags_ns_ctl_table[]

2019-05-24 Thread Eric Dumazet
(struct net *)->ipv6.fqdir will soon be a pointer, so make sure ip6_frags_ns_ctl_table[] does not reference init_net. ip6_frags_ns_ctl_register() can perform the needed initialization for all netns. Signed-off-by: Eric Dumazet --- net/ipv6/reassembly.c | 15 +-- 1 file changed

[PATCH net-next 03/11] net: rename struct fqdir fields

2019-05-24 Thread Eric Dumazet
Rename the @frags fields from structs netns_ipv4, netns_ipv6, netns_nf_frag and netns_ieee802154_lowpan to @fqdir Signed-off-by: Eric Dumazet --- include/net/netns/ieee802154_6lowpan.h | 2 +- include/net/netns/ipv4.h| 2 +- include/net/netns/ipv6.h| 4

[PATCH net-next 07/11] ieee820154: 6lowpan: no longer reference init_net in lowpan_frags_ns_ctl_table

2019-05-24 Thread Eric Dumazet
(struct net *)->ieee802154_lowpan.fqdir will soon be a pointer, so make sure lowpan_frags_ns_ctl_table[] does not reference init_net. lowpan_frags_ns_sysctl_register() can perform the needed initialization for all netns. Signed-off-by: Eric Dumazet --- net/ieee802154/6lowpan/reassembly.c |

[PATCH net-next 01/11] inet: rename netns_frags to fqdir

2019-05-24 Thread Eric Dumazet
1) struct netns_frags is renamed to struct fqdir This structure is really holding many frag queues in a hash table. 2) (struct inet_frag_queue)->net field is renamed to fqdir since net is generally associated to a 'struct net' pointer in networking stack. Signed-off-by:

[PATCH net-next 09/11] net: add a net pointer to struct fqdir

2019-05-24 Thread Eric Dumazet
fqdir will soon be dynamically allocated. We need to reach the struct net pointer from fqdir, so add it, and replace the various container_of() constructs by direct access to the new field. Signed-off-by: Eric Dumazet --- include/net/inet_frag.h | 5 - net/ieee802154

[PATCH net-next 06/11] netfilter: ipv6: nf_defrag: no longer reference init_net in nf_ct_frag6_sysctl_table

2019-05-24 Thread Eric Dumazet
(struct net *)->nf_frag.fqdir will soon be a pointer, so make sure nf_ct_frag6_sysctl_table[] does not reference init_net. nf_ct_frag6_sysctl_register() can perform the needed initialization for all netns. Signed-off-by: Eric Dumazet --- net/ipv6/netfilter/nf_conntrack_reasm.c |

[PATCH net-next 10/11] net: dynamically allocate fqdir structures

2019-05-24 Thread Eric Dumazet
Following patch will add rcu grace period before fqdir rhashtable destruction, so we need to dynamically allocate fqdir structures to not force expensive synchronize_rcu() calls in netns dismantle path. Signed-off-by: Eric Dumazet --- include/net/inet_frag.h | 17

[PATCH net-next 11/11] inet: frags: rework rhashtable dismantle

2019-05-24 Thread Eric Dumazet
fb fb fb fb fb ^ 8880a6497b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 8880a6497c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: 648700f76b03 ("inet: frags: use rhashtables for reassembly units") Signed-off-by: Eric Dumazet Reported-by: syzbot --- include/net/inet_fra

[PATCH net-next 3/3] inet: frags: fix use-after-free read in inet_frag_destroy_rcu

2019-05-27 Thread Eric Dumazet
: rework rhashtable dismantle") Signed-off-by: Eric Dumazet Reported-by: syzbot --- include/net/inet_frag.h | 3 +++ net/ipv4/inet_fragment.c | 20 ++-- 2 files changed, 21 insertions(+), 2 deletions(-) diff --git a/include/net/inet_frag.h b/include/net/inet

[PATCH net-next 0/3] inet: frags: followup to 'inet-frags-avoid-possible-races-at-netns-dismantle'

2019-05-27 Thread Eric Dumazet
smantle at module removal. Eric Dumazet (3): inet: frags: uninline fqdir_init() inet: frags: call inet_frags_fini() after unregister_pernet_subsys() inet: frags: fix use-after-free read in inet_frag_destroy_rcu include/net/inet_frag.h | 23 +++-- net/ieee802154/6lowpan/re

[PATCH net-next 2/3] inet: frags: call inet_frags_fini() after unregister_pernet_subsys()

2019-05-27 Thread Eric Dumazet
the following patch. Fixes: d4ad4d22e7ac ("inet: frags: use kmem_cache for inet_frag_queue") Signed-off-by: Eric Dumazet --- net/ieee802154/6lowpan/reassembly.c | 2 +- net/ipv6/reassembly.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ieee8021

[PATCH net-next 1/3] inet: frags: uninline fqdir_init()

2019-05-27 Thread Eric Dumazet
fqdir_init() is not fast path and is getting bigger. Signed-off-by: Eric Dumazet --- include/net/inet_frag.h | 20 +--- net/ipv4/inet_fragment.c | 19 +++ 2 files changed, 20 insertions(+), 19 deletions(-) diff --git a/include/net/inet_frag.h b/include/net

[PATCH net] llc: fix skb leak in llc_build_and_send_ui_pkt()

2019-05-27 Thread Eric Dumazet
:1972 [inline] [<922d78d9>] __x64_sys_sendto+0x2a/0x30 net/socket.c:1972 [<cec820c1>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301 [<0c32554f>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 1da177e4c3f4 ("

Re: [PATCH net-next 11/11] inet: frags: rework rhashtable dismantle

2019-05-28 Thread Eric Dumazet
On Mon, May 27, 2019 at 11:34 PM Herbert Xu wrote: > > Hi Eric: > > Eric Dumazet wrote: > > > > +void fqdir_exit(struct fqdir *fqdir) > > +{ > > + fqdir->high_thresh = 0; /* prevent creation of new frags */ > > + > > +

Re: [PATCH v3 bpf-next 1/6] bpf: Create BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY

2019-05-28 Thread Eric Dumazet
On 5/27/19 8:49 PM, brakmo wrote: > Create new macro BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY() to be used by > __cgroup_bpf_run_filter_skb for EGRESS BPF progs so BPF programs can > request cwr for TCP packets. > ... > +#define BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY(array, ctx, func)

Re: [PATCH] tcp: re-enable high throughput for low pacing rate

2019-05-28 Thread Eric Dumazet
On 5/28/19 11:28 AM, Sergej Benilov wrote: > Since commit 605ad7f184b60cfaacbc038aa6c55ee68dee3c89 "tcp: refine TSO > autosizing", > the TSQ limit is computed as the smaller of > sysctl_tcp_limit_output_bytes and max(2 * skb->truesize, sk->sk_pacing_rate > >> 10). > For low pacing rates, this

Re: [PATCH] v3.19.8: tcp: re-enable high throughput for low pacing rate

2019-05-28 Thread Eric Dumazet
On 5/28/19 11:34 AM, Sergej Benilov wrote: > Since commit 605ad7f184b60cfaacbc038aa6c55ee68dee3c89 "tcp: refine TSO > autosizing", > the TSQ limit is computed as the smaller of > sysctl_tcp_limit_output_bytes and max(2 * skb->truesize, sk->sk_pacing_rate > >> 10). > For low pacing rates, this

Re: [PATCH v3 bpf-next 1/6] bpf: Create BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY

2019-05-28 Thread Eric Dumazet
On 5/28/19 11:54 AM, Lawrence Brakmo wrote: > On 5/28/19, 6:43 AM, "netdev-ow...@vger.kernel.org on behalf of Eric Dumazet" > wrote: > > Why are you using preempt_enable_no_resched() here ? > > Because that is what __BPF_PROG_RUN

Re: [PATCH net] net/sched: act_pedit: fix 'ex munge' on network header in case of QinQ packet

2019-05-28 Thread Eric Dumazet
On 5/28/19 1:50 PM, Davide Caratti wrote: > Like it has been done in commit 2ecba2d1e45b ("net: sched: act_csum: Fix > csum calc for tagged packets"), also 'pedit' needs to adjust the network > offset when multiple tags are present in the packets: otherwise wrong IP > headers (but good checksums

Re: [PATCH] inet: frags: Remove unnecessary smp_store_release/READ_ONCE

2019-05-29 Thread Eric Dumazet
; > Therefore this patch removes the unnecessary smp_store_release call > as well as the corresponding READ_ONCE on the read-side in order to > not confuse future readers of this code. Comments have been added > in their places. > > Signed-off-by: Herbert Xu > SGTM, thanks. Reviewed-by: Eric Dumazet David, this targets net-next tree :)

[PATCH net] net-gro: fix use-after-free read in napi_gro_frags()

2019-05-29 Thread Eric Dumazet
x1b3/0x2f0 fs/read_write.c:1015 do_writev+0x15b/0x330 fs/read_write.c:1058 Fixes: a50e233c50db ("net-gro: restore frag0 optimization") Signed-off-by: Eric Dumazet Reported-by: syzbot --- net/core/dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/dev.c

Re: [PATCH net-next 1/7] net: Don't disable interrupts in napi_alloc_frag()

2019-05-29 Thread Eric Dumazet
On 5/29/19 3:15 PM, Sebastian Andrzej Siewior wrote: > netdev_alloc_frag() can be used from any context and is used by NAPI > and non-NAPI drivers. Non-NAPI drivers use it in interrupt context > and NAPI drivers use it during initial allocation (->ndo_open() or > ->ndo_change_mtu()). Some NAPI d

Re: [PATCH] net/neighbour: fix potential null pointer deference

2019-05-31 Thread Eric Dumazet
On 5/31/19 1:29 AM, Young Xiao wrote: > There is a possible null pointer deference bugs in neigh_fill_info(), > which is similar to the bug which was fixed in commit 6adc5fd6a142 > ("net/neighbour: fix crash at dumping device-agnostic proxy entries"). > > Signed-off-by: Young Xiao <92siuy...@gm

Re: [PATCH] net/vxlan: fix potential null pointer deference

2019-05-31 Thread Eric Dumazet
On 5/31/19 1:34 AM, Young Xiao wrote: > There is a possible null pointer deference bug in vxlan_fdb_info(), > which is similar to the bug which was fixed in commit 6adc5fd6a142 > ("net/neighbour: fix crash at dumping device-agnostic proxy entries"). > > Signed-off-by: Young Xiao <92siuy...@gmai

Re: [PATCH] inet: frags: Remove unnecessary smp_store_release/READ_ONCE

2019-05-31 Thread Eric Dumazet
On 5/31/19 7:45 AM, Herbert Xu wrote: > On Fri, May 31, 2019 at 10:24:08AM +0200, Dmitry Vyukov wrote: >> >> OK, let's call it barrier. But we need more than a barrier here then. > > READ_ONCE/WRITE_ONCE is not some magical dust that you sprinkle > around in your code to make it work without lo

Re: [PATCH] inet: frags: Remove unnecessary smp_store_release/READ_ONCE

2019-05-31 Thread Eric Dumazet
On Fri, May 31, 2019 at 9:29 AM Andrea Parri wrote: > > On Fri, May 31, 2019 at 08:45:47AM -0700, Eric Dumazet wrote: > > On 5/31/19 7:45 AM, Herbert Xu wrote: > > > > In this case the code doesn't need them because an implicit > > > barrier() (which i

Re: [PATCH] inet: frags: Remove unnecessary smp_store_release/READ_ONCE

2019-05-31 Thread Eric Dumazet
On Fri, May 31, 2019 at 10:11 AM Paul E. McKenney wrote: > > On Fri, May 31, 2019 at 08:45:47AM -0700, Eric Dumazet wrote: > > > > > > On 5/31/19 7:45 AM, Herbert Xu wrote: > > > On Fri, May 31, 2019 at 10:24:08AM +0200, Dmitry Vyukov wrote: > > >>

[PATCH net-next] ipv6: icmp: use this_cpu_read() in icmpv6_sk()

2019-05-31 Thread Eric Dumazet
In general, this_cpu_read(*X) is faster than *this_cpu_ptr(X) Also remove the inline attibute, totally useless. Signed-off-by: Eric Dumazet Cc: Kefeng Wang --- net/ipv6/icmp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c index

Re: [PATCH net-next 1/4] net: use indirect calls helpers for ptype hook

2019-05-31 Thread Eric Dumazet
On 5/3/19 8:01 AM, Paolo Abeni wrote: > This avoids an indirect call per RX IPv6/IPv4 packet. > Note that we don't want to use the indirect calls helper for taps. > > Signed-off-by: Paolo Abeni > --- > net/core/dev.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --gi

Re: [PATCH net-next 0/7] net: add struct nexthop to fib{6}_info

2019-05-31 Thread Eric Dumazet
On 5/31/19 2:38 PM, David Ahern wrote: > On 5/31/19 3:29 PM, David Miller wrote: >> David, can you add some supplementary information to your cover letter >> et al. which seems to be part of what Alexei is asking for and seems >> quite reasonable? >> > > It is not clear to me what more is want

[PATCH net-next] ipv6: use this_cpu_read() in rt6_get_pcpu_route()

2019-05-31 Thread Eric Dumazet
this_cpu_read(*X) is faster than *this_cpu_ptr(X) Signed-off-by: Eric Dumazet --- net/ipv6/route.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index fada5a13bcb2a286bb20a350c1873b1b16dc866a

[PATCH net-next] ipv4: icmp: use this_cpu_read() in icmp_sk()

2019-05-31 Thread Eric Dumazet
this_cpu_read(*X) is faster than *this_cpu_ptr(X) Signed-off-by: Eric Dumazet --- net/ipv4/icmp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index f3a5893b1e8619716f19f85dc77f2e1e12284b4d..49d6b037b113e85877f8e689e690f1c0d3427386

[PATCH net-next] tcp: use this_cpu_read(*X) instead of *this_cpu_ptr(X)

2019-05-31 Thread Eric Dumazet
this_cpu_read(*X) is slightly faster than *this_cpu_ptr(X) Signed-off-by: Eric Dumazet --- net/ipv4/tcp_ipv4.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index af81e4a6a8d8eac9aad551a129384ff6b1bf2f6c

Re: [PATCH net-next 0/7] net: add struct nexthop to fib{6}_info

2019-05-31 Thread Eric Dumazet
On 5/31/19 7:29 PM, David Ahern wrote: > On 5/31/19 7:04 PM, Eric Dumazet wrote: >> >> I have a bunch (about 15 ) of syzbot reports, probably caused to your latest >> patch series. >> >> Do we want to stabilize first, or do you expect this new pa

Re: [PATCH net-next 0/7] net: add struct nexthop to fib{6}_info

2019-05-31 Thread Eric Dumazet
On 5/31/19 7:34 PM, Eric Dumazet wrote: > > > On 5/31/19 7:29 PM, David Ahern wrote: >> On 5/31/19 7:04 PM, Eric Dumazet wrote: >>> >>> I have a bunch (about 15 ) of syzbot reports, probably caused to your >>> latest patch series. >>> >&

[PATCH net-next] net: fix use-after-free in kfree_skb_list

2019-06-02 Thread Eric Dumazet
kbuff fraglist splitter") Fixes: c8b17be0b7a4 ("net: ipv4: add skbuff fraglist splitter") Signed-off-by: Eric Dumazet Cc: Pablo Neira Ayuso --- include/net/ip.h | 1 - include/net/ipv6.h| 1 - net/ipv4/ip_output.c | 5 ++--- net/ipv6/ip6_output.c | 5 ++--- net/ipv6/netfilter.

Re: [PATCH net v2] net: fix indirect calls helpers for ptype list hooks.

2019-06-04 Thread Eric Dumazet
stead we can wrap the list_func invocation. > > v1 -> v2: > - use the correct fix tag > > Fixes: f5737cbadb7d ("net: use indirect calls helpers for ptype hook") > Suggested-by: Eric Dumazet > Signed-off-by: Paolo Abeni > Acked-by: Edward Cree > --- Reviewed-by: Eric Dumazet

[PATCH net] ipv6: tcp: enable flowlabel reflection in some RST packets

2019-06-04 Thread Eric Dumazet
Signed-off-by: Eric Dumazet --- net/ipv6/tcp_ipv6.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index beaf284563015ef0677c39fc056e6ecde3518920..07684f1e02f773a9d3e22a86ae4e7b853cc0b73e 100644 --- a/net/ipv6/tcp_ip

Re: [PATCH net] ipv6: tcp: enable flowlabel reflection in some RST packets

2019-06-04 Thread Eric Dumazet
On 6/4/19 12:29 PM, Eric Dumazet wrote: > This extends commit 22b6722bfa59 ("ipv6: Add sysctl for per > namespace flow label reflection"), for some TCP RST packets. > > When RST packets are sent because no socket could be found, > it makes sense to use flowlabel_reflec

Re: [PATCH net] inet_connection_sock: remove unused parameter of reqsk_queue_unlink func

2019-06-05 Thread Eric Dumazet
On 6/5/19 3:49 AM, Zhiqiang Liu wrote: > small cleanup: "struct request_sock_queue *queue" parameter of > reqsk_queue_unlink > func is never used in the func, so we can remove it. > > Signed-off-by: Zhiqiang Liu > --- SGTM Reviewed-by: Eric Dumazet

[PATCH v2 net-next 0/2] ipv6: tcp: more control on RST flowlabels

2019-06-05 Thread Eric Dumazet
First patch allows to reflect incoming IPv6 flowlabel on RST packets sent when no socket could handle the packet. Second patch makes sure we send the same flowlabel for RST or ACK packets on behalf of TIME_WAIT sockets. Eric Dumazet (2): ipv6: tcp: enable flowlabel reflection in some RST

[PATCH v2 net-next 1/2] ipv6: tcp: enable flowlabel reflection in some RST packets

2019-06-05 Thread Eric Dumazet
n order to provide full control of this new feature, flowlabel_reflect becomes a bitmask. Signed-off-by: Eric Dumazet --- Documentation/networking/ip-sysctl.txt | 20 +++- net/ipv6/af_inet6.c| 2 +- net/ipv6/sysctl_net_ipv6.c | 3 +++ net/ipv6/

[PATCH v2 net-next 2/2] ipv6: tcp: send consistent flowlabel in TIME_WAIT state

2019-06-05 Thread Eric Dumazet
flowlabel. Signed-off-by: Eric Dumazet Cc: Florent Fourcot --- net/ipv6/tcp_ipv6.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 4ccb06ea8ce32d614fc0848e1c4e74b441fa1f2c..f4e609a48e68442693936050c2336ca1e80e1710 100644 --- a/net/ipv6/

[PATCH net] ipv6: flowlabel: fl6_sock_lookup() must use atomic_inc_not_zero

2019-06-06 Thread Eric Dumazet
RCU 101 : Before taking a refcount, make sure the object is not already scheduled for deletion. Fixes: 18367681a10b ("ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.") Signed-off-by: Eric Dumazet Cc: Willem de Bruijn --- net/ipv6/ip6_flowlabel.c | 4 ++-- 1 fi

Re: [PATCH net] ipv6: flowlabel: fl6_sock_lookup() must use atomic_inc_not_zero

2019-06-06 Thread Eric Dumazet
On Thu, Jun 6, 2019 at 2:22 PM Eric Dumazet wrote: > > RCU 101 : Before taking a refcount, make sure the object is not already > scheduled for deletion. > I will send a V2, there is a second atomic_inc() which needs to be changed in ipv6_flowlabel_opt()

[PATCH v2 net] ipv6: flowlabel: fl6_sock_lookup() must use atomic_inc_not_zero

2019-06-06 Thread Eric Dumazet
Before taking a refcount, make sure the object is not already scheduled for deletion. Same fix is needed in ipv6_flowlabel_opt() Fixes: 18367681a10b ("ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.") Signed-off-by: Eric Dumazet Cc: Willem de Bruijn --- net/ipv6/ip6_fl

Re: [PATCH] sis900: re-enable high throughput

2019-06-07 Thread Eric Dumazet
2d2e..ca17b50c 100644 >> --- a/drivers/net/ethernet/sis/sis900.c >> +++ b/drivers/net/ethernet/sis/sis900.c >> @@ -1604,6 +1604,7 @@ sis900_start_xmit(struct sk_buff *skb, struct >> net_device *net_dev) >> unsigned int index_cur_tx, index_dirty_tx; >> unsigned int cou

[PATCH net-next 1/1] ipv6: tcp: fix potential NULL deref in tcp_v6_send_reset()

2019-06-07 Thread Eric Dumazet
mit 0f85feae6b71 ("tcp: fix more NULL deref after prequeue changes"), I should have known better. Fixes: 323a53c41292 ("ipv6: tcp: enable flowlabel reflection in some RST packets") Signed-off-by: Eric Dumazet Reported-by: syzbot --- net/ipv6/tcp_ipv6.c | 2 +- 1 file changed, 1 i

[PATCH net-next] ipv6: tcp: send consistent autoflowlabel in TIME_WAIT state

2019-06-08 Thread Eric Dumazet
txhash into the TIME_WAIT socket. After this patch, ACK or RST packets sent on behalf of a TIME_WAIT socket have the flowlabel that was previously used by the flow. Signed-off-by: Eric Dumazet --- include/net/inet_timewait_sock.h | 1 + net/ipv4/tcp_minisocks.c | 1 + net/

[PATCH net-next] tcp: take care of SYN_RECV sockets in tcp_v4_send_ack() and tcp_v6_send_response()

2019-06-10 Thread Eric Dumazet
lso provides a socket pointer to sock_net_uid() calls. Fixes: 00483690552c ("tcp: Add mark for TIMEWAIT sockets") Signed-off-by: Eric Dumazet Cc: Jon Maxwell --- net/ipv4/tcp_ipv4.c | 6 -- net/ipv6/tcp_ipv6.c | 1 + 2 files changed, 5 insertions(+), 2 deletions(-) diff --g

Re: [PATCH net-next] tcp: take care of SYN_RECV sockets in tcp_v4_send_ack() and tcp_v6_send_response()

2019-06-10 Thread Eric Dumazet
On Mon, Jun 10, 2019 at 3:04 PM David Miller wrote: > > From: Eric Dumazet > Date: Mon, 10 Jun 2019 14:45:43 -0700 > > > Using sk_to_full_sk() should get back to the listener socket. > > net/ipv6/tcp_ipv6.c: In function ‘tcp_v6_send_response’: > net/ipv6/tcp_ipv6

Re: [PATCH net-next] tcp: take care of SYN_RECV sockets in tcp_v4_send_ack() and tcp_v6_send_response()

2019-06-10 Thread Eric Dumazet
gt; On Tue, Jun 11, 2019 at 7:45 AM Eric Dumazet wrote: > > > > TCP can send ACK packets on behalf of SYN_RECV sockets. > > > > tcp_v4_send_ack() and tcp_v6_send_response() incorrectly > > dereference sk->sk_mark for non TIME_WAIT sockets. > > > >

Re: [Patch net] net_sched: unset TCQ_F_CAN_BYPASS when adding filters

2019-07-13 Thread Eric Dumazet
t; This fix is not perfect, it only unsets the flag but does not set it back > because we have to save the information somewhere in the qdisc if we > really want that. > > Fixes: 4b549a2ef4be ("fq_codel: Fair Queue Codel AQM") > Cc: Eric Dumazet > Signed-off-by: Cong

Re: [bpf-next RFC 3/6] bpf: add bpf_tcp_gen_syncookie helper

2019-07-16 Thread Eric Dumazet
On 7/16/19 2:26 AM, Petar Penkov wrote: > From: Petar Penkov > > This helper function allows BPF programs to try to generate SYN > cookies, given a reference to a listener socket. The function works > from XDP and with an skb context since bpf_skc_lookup_tcp can lookup a > socket in both cases

Re: [PATCH] net/mlx5: Replace kfree with kvfree

2019-07-17 Thread Eric Dumazet
On 7/17/19 10:03 AM, Chuhong Yuan wrote: > Variable allocated by kvmalloc should not be freed by kfree. > Because it may be allocated by vmalloc. > So replace kfree with kvfree here. > > Signed-off-by: Chuhong Yuan > --- Please add corresponding Fixes: tag, thanks ! > drivers/net/ethernet/m

Re: [Patch net v2] net_sched: unset TCQ_F_CAN_BYPASS when adding filters

2019-07-17 Thread Eric Dumazet
we setup in any scenario, otherwise our packets steering > > policy could not be enforced. > ... > > Eric I think your feedback was addressed, please review to confirm. Yes, this seems good to me, thanks. Reviewed-by: Eric Dumazet

Re: IP GRO verifies csum again?

2019-07-18 Thread Eric Dumazet
On 7/18/19 9:49 AM, Jacob Wen wrote: > Hi, > > inet_gro_receive verifies IP csum but a NIC already did so and set > CHECKSUM_UNNECESSARY. > > > https://github.com/torvalds/linux/blob/v5.2/net/ipv4/af_inet.c#L1432-L1433 > > if (unlikely(ip_fast_csum((u8 *)iph, 5))) > >         goto out_unlo

[PATCH net] tcp: fix tcp_set_congestion_control() use from bpf hook

2019-07-18 Thread Eric Dumazet
ix is to add a new parameter to tcp_set_congestion_control(), so that the ns_capable() call is only performed under the right context. Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control") Signed-off-by: Eric Dumazet Cc: Lawrence Brakmo Reported-by: Neal Cardwell --- include/net

[PATCH net] tcp: be more careful in tcp_fragment()

2019-07-19 Thread Eric Dumazet
nt() should apply sane memory limits") Signed-off-by: Eric Dumazet Reported-by: Andrew Prout Tested-by: Andrew Prout Tested-by: Jonathan Lemon Tested-by: Michal Kubecek Acked-by: Neal Cardwell Acked-by: Yuchung Cheng Acked-by: Christoph Paasch Cc: Jonathan Looney --- include/ne

Re: [bpf-next 0/6] Introduce a BPF helper to generate SYN cookies

2019-07-22 Thread Eric Dumazet
On Tue, Jul 23, 2019 at 2:20 AM Petar Penkov wrote: > > From: Petar Penkov > > This patch series introduces a BPF helper function that allows generating SYN > cookies from BPF. Currently, this helper is enabled at both the TC hook and > the > XDP hook. Please provide performance numbers ? We

[PATCH bpf 2/2] selftests/bpf: add another gso_segs access

2019-07-23 Thread Eric Dumazet
Use BPF_REG_1 for source and destination of gso_segs read, to exercise "bpf: fix access to skb_shared_info->gso_segs" fix. Signed-off-by: Eric Dumazet Suggested-by: Stanislav Fomichev --- tools/testing/selftests/bpf/verifier/ctx_skb.c | 11 +++ 1 file changed, 11 insert

[PATCH bpf 1/2] bpf: fix access to skb_shared_info->gso_segs

2019-07-23 Thread Eric Dumazet
w BPF programs access skb_shared_info->gso_segs field") Signed-off-by: Eric Dumazet Reported-by: syzbot --- net/core/filter.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index 4e2a79b2fd77f36ba2

[PATCH bpf 0/2] bpf: gso_segs fixes

2019-07-23 Thread Eric Dumazet
First patch changes the kernel, second patch adds a new test. Note that other patches might be needed to take care of similar issues in sock_ops_convert_ctx_access() and SOCK_OPS_GET_FIELD() Eric Dumazet (2): bpf: fix access to skb_shared_info->gso_segs selftests/bpf: add another gso_s

Re: [PATCH net-next] dpaa2-eth: Don't use netif_receive_skb_list for TCP frames

2019-07-24 Thread Eric Dumazet
On 7/23/19 7:28 PM, Ioana Radulescu wrote: > Using Rx skb bulking for all frames may negatively impact the > performance in some TCP termination scenarios, as it effectively > bypasses GRO. > > - list_add_tail(&skb->list, ch->rx_list); > + if (frame_is_tcp(fd, fas)) > + na

Re: [PATCH] tcp: add new tcp_mtu_probe_floor sysctl

2019-07-27 Thread Eric Dumazet
On Sat, Jul 27, 2019 at 4:23 AM Josh Hunt wrote: > > The current implementation of TCP MTU probing can considerably > underestimate the MTU on lossy connections allowing the MSS to get down to > 48. We have found that in almost all of these cases on our networks these > paths can handle much large

Re: [PATCH] tcp: add new tcp_mtu_probe_floor sysctl

2019-07-28 Thread Eric Dumazet
On Sun, Jul 28, 2019 at 1:21 AM Josh Hunt wrote: > > On 7/27/19 12:05 AM, Eric Dumazet wrote: > > On Sat, Jul 27, 2019 at 4:23 AM Josh Hunt wrote: > >> > >> The current implementation of TCP MTU probing can considerably > >> underestimate the MTU on lossy c

Re: [PATCH v2 2/2] tcp: Update TCP_BASE_MSS comment

2019-08-07 Thread Eric Dumazet
On 8/8/19 1:52 AM, Josh Hunt wrote: > TCP_BASE_MSS is used as the default initial MSS value when MTU probing is > enabled. Update the comment to reflect this. > > Suggested-by: Neal Cardwell > Signed-off-by: Josh Hunt > --- Signed-off-by: Eric Dumazet

Re: [PATCH v2 1/2] tcp: add new tcp_mtu_probe_floor sysctl

2019-08-07 Thread Eric Dumazet
; The new sysctl will still default to TCP_MIN_SND_MSS (48), but gives > administrators the ability to control the floor of MSS probing. > > Signed-off-by: Josh Hunt Signed-off-by: Eric Dumazet

Re: [PATCH net 1/2] sock: make cookie generation global instead of per netns

2019-08-08 Thread Eric Dumazet
ue does not provide much value either way. > > Signed-off-by: Daniel Borkmann > Cc: Eric Dumazet > Cc: Alexei Starovoitov > Cc: Willem de Bruijn > Cc: Martynas Pumputis > --- > include/net/net_namespace.h | 1 - > include/uapi/linux/bpf.h| 4 ++-- > net/core/sock_

Re: [PATCH net 1/2] sock: make cookie generation global instead of per netns

2019-08-08 Thread Eric Dumazet
On Thu, Aug 8, 2019 at 1:09 PM Daniel Borkmann wrote: > > On 8/8/19 12:45 PM, Eric Dumazet wrote: > > On Thu, Aug 8, 2019 at 11:50 AM Daniel Borkmann > > wrote: > > > >> Socket cookie consumers must assume the value as opqaue in any case. > >> The

Re: [PATCH net-next] r8169: make use of xmit_more

2019-08-09 Thread Eric Dumazet
On Fri, Aug 9, 2019 at 10:04 AM Holger Hoffstätte wrote: > > On 8/8/19 10:08 PM, Heiner Kallweit wrote: > (..snip..) > >>> > >>> I was about to ask exactly that, whether you have TSO enabled. I don't > >>> know what > >>> can trigger the HW issue, it was just confirmed by Realtek that this chip

[PATCH net-next] tcp: batch calls to sk_flush_backlog()

2019-08-09 Thread Eric Dumazet
as Yeganeh Signed-off-by: Eric Dumazet --- net/ipv4/tcp.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index a0a66321c0ee99918b2080219dbaefcf3c398e13..f8fa1686f7f3e64f5d4ea8163e7f87538cc0d672 100644 --- a/net/ipv4/tcp.c +++ b/net/

Re: [PATCH v2 1/3] tipc: fix memory leak issue

2019-08-12 Thread Eric Dumazet
On 8/12/19 9:32 AM, Ying Xue wrote: > syzbot found the following memory leak: > > [ 68.602482][ T7130] kmemleak: 2 new suspected memory leaks (see > /sys/kernel/debug/kmemleak) > BUG: memory leak > unreferenced object 0x88810df83c00 (size 512): > comm "softirq", pid 0, jiffies 42949423

[PATCH net] nexthop: use nlmsg_parse_deprecated()

2019-08-12 Thread Eric Dumazet
xe7 Fixes: ab84be7e54fc ("net: Initial nexthop code") Signed-off-by: Eric Dumazet Reported-by: syzbot Cc: David Ahern --- net/ipv4/nexthop.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c index 5fe5a3981d4316ad8d9d

[PATCH net] batman-adv: fix uninit-value in batadv_netlink_get_ifindex()

2019-08-12 Thread Eric Dumazet
/0x70 net/socket.c:2305 do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x63/0xe7 RIP: 0033:0x440209 Fixes: b60620cf567b ("batman-adv: netlink: hardif query") Signed-off-by: Eric Dumazet Reported-by: syzbot Cc: Marek Lindner Cc: Simon Wunderlich C

Re: [PATCH v3 net-next 0/3] net: batched receive in GRO path

2019-08-12 Thread Eric Dumazet
On 8/12/19 7:51 PM, Ioana Ciocoi Radulescu wrote: >> -Original Message- >> From: Edward Cree >> Sent: Friday, August 9, 2019 8:32 PM >> To: Ioana Ciocoi Radulescu >> Cc: David Miller ; netdev ; >> Eric Dumazet ; linux-net-driv...@solarflare.com &g

Re: [PATCH net] netlink: Fix nlmsg_parse as a wrapper for strict message parsing

2019-08-13 Thread Eric Dumazet
e with > NL_VALIDATE_STRICT for the validate argument very much like > nlmsg_parse_deprecated is for NL_VALIDATE_LIBERAL. > > Fixes: 3de6440354465 ("netlink: re-add parse/validate functions in strict > mode") > Reported-by: Eric Dumazet > Reported-by: syzbot >

[PATCH net] Revert "virtio_net: replace netdev_alloc_skb_ip_align() with napi_alloc_skb()"

2021-01-12 Thread Eric Dumazet
From: Eric Dumazet This reverts commit c67f5db82027ba6d2ea4ac9176bc45996a03ae6a. While using page fragments instead of a kmalloc backed skb->head might give a small performance improvement in some cases, there is a huge risk of memory use under estimation. GOOD_COPY_LEN is 128 bytes. T

Re: [PATCH net 2/2] mptcp: better msk-level shutdown.

2021-01-13 Thread Eric Dumazet
On 1/12/21 6:25 PM, Paolo Abeni wrote: > Instead of re-implementing most of inet_shutdown, re-use > such helper, and implement the MPTCP-specific bits at the > 'proto' level. > > The msk-level disconnect() can now be invoked, lets provide a > suitable implementation. > > As a side effect, this

Re: [PATCH net 2/2] mptcp: better msk-level shutdown.

2021-01-13 Thread Eric Dumazet
On 1/13/21 11:21 AM, Eric Dumazet wrote: > > > On 1/12/21 6:25 PM, Paolo Abeni wrote: >> Instead of re-implementing most of inet_shutdown, re-use >> such helper, and implement the MPTCP-specific bits at the >> 'proto' level. >> >> The msk-leve

[PATCH net] net: avoid 32 x truesize under-estimation for tiny skbs

2021-01-13 Thread Eric Dumazet
From: Eric Dumazet Both virtio net and napi_get_frags() allocate skbs with a very small skb->head While using page fragments instead of a kmalloc backed skb->head might give a small performance improvement in some cases, there is a huge risk of under estimating memory usage. Fo

[PATCH net] net_sched: reject silly cell_log in qdisc_get_rtab()

2021-01-14 Thread Eric Dumazet
From: Eric Dumazet iproute2 probably never goes beyond 8 for the cell exponent, but stick to the max shift exponent for signed 32bit. UBSAN reported: UBSAN: shift-out-of-bounds in net/sched/sch_api.c:389:22 shift exponent 130 is too large for 32-bit type 'int' CPU: 1 PID: 8450

[PATCH net] net_sched: gen_estimator: support large ewma log

2021-01-14 Thread Eric Dumazet
From: Eric Dumazet syzbot report reminded us that very big ewma_log were supported in the past, even if they made litle sense. tc qdisc replace dev xxx root est 1sec 131072sec ... While fixing the bug, also add boundary checks for ewma_log, in line with range supported by iproute2. UBSAN

[PATCH net] net_sched: avoid shift-out-of-bounds in tcindex_set_parms()

2021-01-14 Thread Eric Dumazet
From: Eric Dumazet tc_index being 16bit wide, we need to check that TCA_TCINDEX_SHIFT attribute is not silly. UBSAN: shift-out-of-bounds in net/sched/cls_tcindex.c:260:29 shift exponent 255 is too large for 32-bit type 'int' CPU: 0 PID: 8516 Comm: syz-executor228 Not tainted 5.10.0-sy

Re: [PATCH net] mptcp: fix locking in mptcp_disconnect()

2021-01-14 Thread Eric Dumazet
On 1/14/21 4:37 PM, Paolo Abeni wrote: > tcp_disconnect() expects the caller acquires the sock lock, > but mptcp_disconnect() is not doing that. Add the missing > required lock. > > Reported-by: Eric Dumazet > Fixes: 76e2a55d1625 ("mptcp: better msk-level shutdown.&q

[PATCH net] tcp: do not mess with cloned skbs in tcp_add_backlog()

2021-01-19 Thread Eric Dumazet
From: Eric Dumazet Heiner Kallweit reported that some skbs were sent with the following invalid GSO properties : - gso_size > 0 - gso_type == 0 This was triggerring a WARN_ON_ONCE() in rtl8169_tso_csum_v2. Juerg Haefliger was able to reproduce a similar issue using a lan78xx NIC and a workl

Re: [PATCH bpf-next v3 3/3] xsk: build skb by page

2021-01-21 Thread Eric Dumazet
On 1/21/21 2:47 PM, Xuan Zhuo wrote: > This patch is used to construct skb based on page to save memory copy > overhead. > > This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the > network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to > directly construct skb. If

Re: [PATCH 1/1] net/ipv4/inet_fragment: Batch fqdir destroy works

2020-12-09 Thread Eric Dumazet
On 12/8/20 10:45 AM, SeongJae Park wrote: > From: SeongJae Park > > In 'fqdir_exit()', a work for destruction of the 'fqdir' is enqueued. > The work function, 'fqdir_work_fn()', calls 'rcu_barrier()'. In case of > intensive 'fqdir_exit()' (e.g., frequent 'unshare(CLONE_NEWNET)' > systemcalls)

Re: [PATCH net] tcp: fix cwnd-limited bug for TSO deferral where we send nothing

2020-12-10 Thread Eric Dumazet
dwell ; Ingemar Johansson S >> ; Yuchung Cheng >> ; Soheil Hassas Yeganeh ; Eric >> Dumazet >> Subject: Re: [PATCH net] tcp: fix cwnd-limited bug for TSO deferral where we >> send nothing >> >> On Tue, 8 Dec 2020 22:57:59 -0500 Neal Cardwell wrote: >>> F

[PATCH bpf] bpf: add schedule point in htab_init_buckets()

2020-12-21 Thread Eric Dumazet
From: Eric Dumazet We noticed that with a LOCKDEP enabled kernel, allocating a hash table with 65536 buckets would use more than 60ms. htab_init_buckets() runs from process context, it is safe to schedule to avoid latency spikes. Fixes: c50eb518e262 ("bpf: Use separate lockdep class for

Re: [PATCH v3] net: neighbor: fix a crash caused by mod zero

2020-12-22 Thread Eric Dumazet
On 12/22/20 1:38 PM, weichenchen wrote: > pneigh_enqueue() tries to obtain a random delay by mod > NEIGH_VAR(p, PROXY_DELAY). However, NEIGH_VAR(p, PROXY_DELAY) > migth be zero at that point because someone could write zero > to /proc/sys/net/ipv4/neigh/[device]/proxy_delay after the > callers c

  1   2   3   4   5   6   7   8   9   10   >