Re: [PATCH net] udp: correct reuseport selection with connected sockets

2019-09-13 Thread Craig Gallek
nal cost for the BPF case and just a single branch for the unconnected udp, tcp listener case! Acked-by: Craig Gallek

Re: [PATCH bpf 1/2] bpf: udp: ipv6: Avoid running reuseport's bpf_prog from __udp6_lib_err

2019-06-03 Thread Craig Gallek
Pv4, which has passed a NULL skb pointer to > reuseport_select_sock(). > > Fixes: 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF") > Cc: Craig Gallek > Signed-off-by: Martin KaFai Lau Acked-by: Craig Gallek

Re: [PATCH bpf v2] bpf, lpm: fix lookup bug in map_delete_elem

2019-02-22 Thread Craig Gallek
udo ./tools/testing/selftests/bpf/test_lpm_map > test_lpm_map: test_lpm_map.c:485: test_lpm_delete: Assertion > `bpf_map_delete_elem(map_fd, key) == -1 && errno == ENOENT' failed. > Aborted > > With the patch: test_lpm_map runs without errors. > > Fixes: e454

Re: [PATCH net] sock_diag: fix use-after-free read in __sk_free

2018-05-18 Thread Craig Gallek
count:0 mapping:88018a02c140 index:0x0 > compound_mapcount: 0 > flags: 0x2fffc008100(slab|head) > raw: 02fffc0000008100 88018a02c140 00010001 > raw: ea00062a1320 ea0006268020 8801d9bdde40 > page dumped because: kasan: bad access detected > > Fixes: b922622ec6ef ("sock_diag: don't broadcast kernel sockets") > Signed-off-by: Eric Dumazet > Cc: Craig Gallek > Reported-by: syzbot Acked-by: Craig Gallek Thanks Eric!

Re: [PATCH net] soreuseport: fix mem leak in reuseport_add_sock()

2018-02-02 Thread Craig Gallek
> Signed-off-by: Eric Dumazet > Reported-by: syzbot+c0ea2226f77a42936...@syzkaller.appspotmail.com Clever fix, thanks Eric(s)! Acked-by: Craig Gallek

Re: [PATCH net] ipv6: Fix SO_REUSEPORT UDP socket with implicit sk_ipv6only

2018-01-25 Thread Craig Gallek
2ea7e74727 ("soreuseport: fast reuseport UDP socket selection") > Signed-off-by: Martin KaFai Lau Wow, good catch! Acked-by: Craig Gallek

Re: [PATCH net v2] netns, rtnetlink: fix struct net reference leak

2017-12-29 Thread Craig Gallek
On Sat, Dec 23, 2017 at 5:12 PM, Nicolas Dichtel wrote: > Le 22/12/2017 à 21:36, Craig Gallek a écrit : >> From: Craig Gallek >> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c >> index 60a71be75aea..4b7ea33f5705 100644 >> --- a/net/core/net_na

[PATCH net v2] netns, rtnetlink: fix struct net reference leak

2017-12-22 Thread Craig Gallek
From: Craig Gallek netns ids were added in commit 0c7aecd4bde4 and defined as signed integers in both the kernel datastructures and the netlink interface. However, the semantics of the implementation assume that the ids are always greater than or equal to zero, except for an internal sentinal

Re: [PATCH net] rtnetlink: fix struct net reference leak

2017-12-22 Thread Craig Gallek
On Fri, Dec 22, 2017 at 8:59 AM, Craig Gallek wrote: > On Fri, Dec 22, 2017 at 3:11 AM, Nicolas Dichtel > wrote: >> Le 21/12/2017 à 23:18, Craig Gallek a écrit : >>> From: Craig Gallek >>> >>> The below referenced commit extended the RTM_GETLINK interface

Re: [PATCH net] rtnetlink: fix struct net reference leak

2017-12-22 Thread Craig Gallek
On Fri, Dec 22, 2017 at 3:11 AM, Nicolas Dichtel wrote: > Le 21/12/2017 à 23:18, Craig Gallek a écrit : >> From: Craig Gallek >> >> The below referenced commit extended the RTM_GETLINK interface to >> allow querying by netns id. The netnsid property was previously >

[PATCH net] rtnetlink: fix struct net reference leak

2017-12-21 Thread Craig Gallek
From: Craig Gallek The below referenced commit extended the RTM_GETLINK interface to allow querying by netns id. The netnsid property was previously defined as a signed integer, but this patch assumes that the user always passes a positive integer. syzkaller discovered this problem by setting

Re: [RFC PATCH] reuseport: compute the ehash only if needed

2017-12-12 Thread Craig Gallek
On Tue, Dec 12, 2017 at 8:09 AM, Paolo Abeni wrote: > When a reuseport socket group is using a BPF filter to distribute > the packets among the sockets, we don't need to compute any hash > value, but the current reuseport_select_sock() requires the > caller to compute such hash in advance. > > Thi

Re: Uninitialized value in __sk_nulls_add_node_rcu()

2017-12-05 Thread Craig Gallek
On Tue, Dec 5, 2017 at 3:07 PM, Eric Dumazet wrote: > On Tue, 2017-12-05 at 14:39 -0500, Craig Gallek wrote: >> On Tue, Dec 5, 2017 at 9:18 AM, Eric Dumazet >> wrote: >> > On Tue, 2017-12-05 at 06:15 -0800, Eric Dumazet wrote: >> > > >> > > + h

Re: Uninitialized value in __sk_nulls_add_node_rcu()

2017-12-05 Thread Craig Gallek
On Tue, Dec 5, 2017 at 9:18 AM, Eric Dumazet wrote: > On Tue, 2017-12-05 at 06:15 -0800, Eric Dumazet wrote: >> >> + hlist_nulls_add_head_rcu(&sk->sk_nulss_node, list); > > Typo here, this needs sk_nulls_node of course. > Thanks Eric, this looks good to me. The tail insertion is still requir

Re: [PATCH net-next] net/reuseport: drop legacy code

2017-11-30 Thread Craig Gallek
ody, so that we can drop some duplicate > code in the ipv4 and ipv6 stack. > > This also allows faster lookup in the above scenario and will allow > us to avoid computing the hash value for successful, BPF based > demultiplexing - in a later patch. > > Signed-off-by: Paolo Aben

[PATCH net-next v2] bpf: fix verifier NULL pointer dereference

2017-11-02 Thread Craig Gallek
From: Craig Gallek do_check() can fail early without allocating env->cur_state under memory pressure. Syzkaller found the stack below on the linux-next tree because of this. kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access gene

Re: [PATCH net-next] bpf: fix verifier NULL pointer dereference

2017-11-02 Thread Craig Gallek
On Thu, Nov 2, 2017 at 11:07 AM, Alexei Starovoitov wrote: > On 11/2/17 7:21 AM, Craig Gallek wrote: >> >> From: Craig Gallek >> >> do_check() can fail early without allocating env->cur_state under >> memory pressure. Syzkaller found the stack below on th

[PATCH net-next] bpf: fix verifier NULL pointer dereference

2017-11-02 Thread Craig Gallek
From: Craig Gallek do_check() can fail early without allocating env->cur_state under memory pressure. Syzkaller found the stack below on the linux-next tree because of this. kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access gene

[PATCH net] tun/tap: sanitize TUNSETSNDBUF input

2017-10-30 Thread Craig Gallek
From: Craig Gallek Syzkaller found several variants of the lockup below by setting negative values with the TUNSETSNDBUF ioctl. This patch adds a sanity check to both the tun and tap versions of this ioctl. watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [repro:2389] Modules linked in

[PATCH net] soreuseport: fix initialization race

2017-10-19 Thread Craig Gallek
From: Craig Gallek Syzkaller stumbled upon a way to trigger WARNING: CPU: 1 PID: 13881 at net/core/sock_reuseport.c:41 reuseport_alloc+0x306/0x3b0 net/core/sock_reuseport.c:39 There are two initialization paths for the sock_reuseport structure in a socket: Through the udp/tcp bind paths of

[PATCH net-next v3 1/2] libbpf: parse maps sections of varying size

2017-10-05 Thread Craig Gallek
From: Craig Gallek This library previously assumed a fixed-size map options structure. Any new options were ignored. In order to allow the options structure to grow and to support parsing older programs, this patch updates the maps section parsing to handle varying sizes. Object files with

[PATCH net-next v3 0/2] libbpf: support more map options

2017-10-05 Thread Craig Gallek
From: Craig Gallek The functional change to this series is the ability to use flags when creating maps from object files loaded by libbpf. In order to do this, the first patch updates the library to handle map definitions that differ in size from libbpf's struct bpf_map_def. For object

[PATCH net-next v3 2/2] libbpf: use map_flags when creating maps

2017-10-05 Thread Craig Gallek
From: Craig Gallek This is required to use BPF_MAP_TYPE_LPM_TRIE or any other map type which requires flags. Signed-off-by: Craig Gallek --- tools/lib/bpf/libbpf.c | 2 +- tools/lib/bpf/libbpf.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/lib/bpf/libbpf.c b

Re: [PATCH net-next v2 1/2] libbpf: parse maps sections of varying size

2017-10-04 Thread Craig Gallek
On Tue, Oct 3, 2017 at 10:39 AM, Daniel Borkmann wrote: > On 10/03/2017 01:07 AM, Alexei Starovoitov wrote: >> >> On 10/2/17 9:41 AM, Craig Gallek wrote: >>> >>> +/* Assume equally sized map definitions */ >>> +map_def_sz = data->d_size /

Re: [PATCH net-next v2 1/2] libbpf: parse maps sections of varying size

2017-10-04 Thread Craig Gallek
On Tue, Oct 3, 2017 at 10:11 AM, Jesper Dangaard Brouer wrote: > On Mon, 2 Oct 2017 12:41:28 -0400 > Craig Gallek wrote: > >> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c >> index 4f402dcdf372..28b300868ad7 100644 >> --- a/tools/lib/bpf/libbpf.c >

Re: [PATCH net-next v2 1/2] libbpf: parse maps sections of varying size

2017-10-04 Thread Craig Gallek
On Tue, Oct 3, 2017 at 10:11 AM, Jesper Dangaard Brouer wrote: > On Mon, 2 Oct 2017 12:41:28 -0400 > Craig Gallek wrote: > >> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c >> index 4f402dcdf372..28b300868ad7 100644 >> --- a/tools/lib/bpf/libbpf.c >

Re: [PATCH net-next v2 1/2] libbpf: parse maps sections of varying size

2017-10-04 Thread Craig Gallek
On Tue, Oct 3, 2017 at 10:03 AM, Jesper Dangaard Brouer wrote: > > > First of all, thank you Craig for working on this. As Alexei says, we > need to improve tools/lib/bpf/libbpf and move towards converting users > of bpf_load.c to this lib instead. > > Comments inlined below. > >> +

[PATCH net-next v2 2/2] libbpf: use map_flags when creating maps

2017-10-02 Thread Craig Gallek
From: Craig Gallek This is required to use BPF_MAP_TYPE_LPM_TRIE or any other map type which requires flags. Signed-off-by: Craig Gallek --- tools/lib/bpf/libbpf.c | 2 +- tools/lib/bpf/libbpf.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/lib/bpf/libbpf.c b

[PATCH net-next v2 1/2] libbpf: parse maps sections of varying size

2017-10-02 Thread Craig Gallek
From: Craig Gallek This library previously assumed a fixed-size map options structure. Any new options were ignored. In order to allow the options structure to grow and to support parsing older programs, this patch updates the maps section parsing to handle varying sizes. Object files with

[PATCH net-next v2 0/2] libbpf: support more map options

2017-10-02 Thread Craig Gallek
From: Craig Gallek The functional change to this series is the ability to use flags when creating maps from object files loaded by libbpf. In order to do this, the first patch updates the library to handle map definitions that differ in size from libbpf's struct bpf_map_def. For object

Re: [PATCH net-next] libbpf: use map_flags when creating maps

2017-09-28 Thread Craig Gallek
On Wed, Sep 27, 2017 at 6:03 PM, Daniel Borkmann wrote: > On 09/27/2017 06:29 PM, Alexei Starovoitov wrote: >> >> On 9/27/17 7:04 AM, Craig Gallek wrote: >>> >>> From: Craig Gallek >>> >>> This extends struct bpf_map_def to include a flags fi

[PATCH net-next] libbpf: use map_flags when creating maps

2017-09-27 Thread Craig Gallek
From: Craig Gallek This extends struct bpf_map_def to include a flags field. Note that this has the potential to break the validation logic in bpf_object__validate_maps and bpf_object__init_maps as they use sizeof(struct bpf_map_def) as a minimal allowable size of a map section. Any bpf program

[PATCH net-next v2] bpf: Optimize lpm trie delete

2017-09-21 Thread Craig Gallek
From: Craig Gallek Before the delete operator was added, this datastructure maintained an invariant that intermediate nodes were only present when necessary to build the tree. This patch updates the delete operation to reinstate that invariant by removing unnecessary intermediate nodes after a

Re: [PATCH net-next] bpf: Optimize lpm trie delete

2017-09-21 Thread Craig Gallek
On Wed, Sep 20, 2017 at 6:56 PM, Daniel Mack wrote: > On 09/20/2017 08:51 PM, Craig Gallek wrote: >> On Wed, Sep 20, 2017 at 12:51 PM, Daniel Mack wrote: >>> Hi Craig, >>> >>> Thanks, this looks much cleaner already :) >>> >>> On 09/20/2017

Re: [PATCH net-next] bpf: Optimize lpm trie delete

2017-09-20 Thread Craig Gallek
On Wed, Sep 20, 2017 at 12:51 PM, Daniel Mack wrote: > Hi Craig, > > Thanks, this looks much cleaner already :) > > On 09/20/2017 06:22 PM, Craig Gallek wrote: >> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c >> index 9d58a576b2ae..b5a7d70ec8b5 100644 >

[PATCH net-next] bpf: Optimize lpm trie delete

2017-09-20 Thread Craig Gallek
From: Craig Gallek Before the delete operator was added, this datastructure maintained an invariant that intermediate nodes were only present when necessary to build the tree. This patch updates the delete operation to reinstate that invariant by removing unnecessary intermediate nodes after a

Re: [PATCH net-next 0/3] Implement delete for BPF LPM trie

2017-09-19 Thread Craig Gallek
On Tue, Sep 19, 2017 at 5:13 PM, Daniel Mack wrote: > On 09/19/2017 10:55 PM, David Miller wrote: >> From: Craig Gallek >> Date: Mon, 18 Sep 2017 15:30:54 -0400 >> >>> This was previously left as a TODO. Add the implementation and >>> extend the test to

Re: [PATCH net-next 1/3] bpf: Implement map_delete_elem for BPF_MAP_TYPE_LPM_TRIE

2017-09-19 Thread Craig Gallek
On Mon, Sep 18, 2017 at 6:53 PM, Alexei Starovoitov wrote: Thanks for the review! Please correct me if I'm wrong... > On 9/18/17 12:30 PM, Craig Gallek wrote: >> >> From: Craig Gallek >> >> This is a simple non-recursive delete operation. It prunes paths >&

[PATCH net-next 3/3] bpf: Test deletion in BPF_MAP_TYPE_LPM_TRIE

2017-09-18 Thread Craig Gallek
From: Craig Gallek Extend the 'random' operation tests to include a delete operation (delete half of the nodes from both lpm implementions and ensure that lookups are still equivalent). Also, add a simple IPv4 test which verifies lookup behavior as nodes are deleted from the tree.

[PATCH net-next 0/3] Implement delete for BPF LPM trie

2017-09-18 Thread Craig Gallek
From: Craig Gallek This was previously left as a TODO. Add the implementation and extend the test to cover it. Craig Gallek (3): bpf: Implement map_delete_elem for BPF_MAP_TYPE_LPM_TRIE bpf: Add uniqueness invariant to trivial lpm test implementation bpf: Test deletion in

[PATCH net-next 2/3] bpf: Add uniqueness invariant to trivial lpm test implementation

2017-09-18 Thread Craig Gallek
From: Craig Gallek The 'trivial' lpm implementation in this test allows equivalent nodes to be added (that is, nodes consisting of the same prefix and prefix length). For lookup operations, this is fine because insertion happens at the head of the (singly linked) list and the first,

[PATCH net-next 1/3] bpf: Implement map_delete_elem for BPF_MAP_TYPE_LPM_TRIE

2017-09-18 Thread Craig Gallek
From: Craig Gallek This is a simple non-recursive delete operation. It prunes paths of empty nodes in the tree, but it does not try to further compress the tree as nodes are removed. Signed-off-by: Craig Gallek --- kernel/bpf/lpm_trie.c | 80

[PATCH net-next] dsa: fix flow disector null pointer

2017-08-15 Thread Craig Gallek
From: Craig Gallek A recent change to fix up DSA device behavior made the assumption that all skbs passing through the flow disector will be associated with a device. This does not appear to be a safe assumption. Syzkaller found the crash below by attaching a BPF socket filter that tries to

Re: [PATCH net-next v3 07/15] bpf: Add setsockopt helper function to bpf

2017-06-21 Thread Craig Gallek
On Wed, Jun 21, 2017 at 12:51 PM, Lawrence Brakmo wrote: > > On 6/20/17, 2:25 PM, "Craig Gallek" wrote: > > On Mon, Jun 19, 2017 at 11:00 PM, Lawrence Brakmo wrote: > > Added support for calling a subset of socket setsockopts from > > BPF_PROG_TY

Re: [PATCH net-next v3 07/15] bpf: Add setsockopt helper function to bpf

2017-06-20 Thread Craig Gallek
On Mon, Jun 19, 2017 at 11:00 PM, Lawrence Brakmo wrote: > Added support for calling a subset of socket setsockopts from > BPF_PROG_TYPE_SOCK_OPS programs. The code was duplicated rather > than making the changes to call the socket setsockopt function because > the changes required would have been

Re: Leak in ipv6_gso_segment()?

2017-06-02 Thread Craig Gallek
On Fri, Jun 2, 2017 at 2:25 PM, Craig Gallek wrote: > On Fri, Jun 2, 2017 at 2:05 PM, David Miller wrote: >> From: Ben Hutchings >> Date: Wed, 31 May 2017 13:26:02 +0100 >> >>> If I'm not mistaken, ipv6_gso_segment() now leaks segs if >>> ip6_find_1

Re: Leak in ipv6_gso_segment()?

2017-06-02 Thread Craig Gallek
On Fri, Jun 2, 2017 at 2:05 PM, David Miller wrote: > From: Ben Hutchings > Date: Wed, 31 May 2017 13:26:02 +0100 > >> If I'm not mistaken, ipv6_gso_segment() now leaks segs if >> ip6_find_1stfragopt() fails. I'm not sure whether the fix would be as >> simple as adding a kfree_skb(segs) or wheth

Re: [PATCH net] ipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt()

2017-05-31 Thread Craig Gallek
implementations to the original ip6_find_1stfragopt and may very well suffer from the same bug I was trying to fix. Maybe it doesn't matter since that bug relied on the user changing the v6 nexthdr field. I need to understand the mip6 code first... In any event, I think this patch applies on its own. Thanks again. Acked-by: Craig Gallek

Re: [net:master 9/12] net/ipv6/ip6_offload.c:120:7-21: WARNING: Unsigned expression compared with zero: unfrag_ip6hlen < 0 (fwd)

2017-05-18 Thread Craig Gallek
On Wed, May 17, 2017 at 10:58 PM, David Miller wrote: > From: Julia Lawall > Date: Thu, 18 May 2017 10:01:07 +0800 (SGT) > >> It may be worth checking on these. The code context is shown in the first >> case (line 120). For the others, at least it gives the line numbers. > ... net/ipv6/ip

[PATCH net-next] ipv6: Prevent overrun when parsing v6 header options

2017-05-16 Thread Craig Gallek
From: Craig Gallek The KASAN warning repoted below was discovered with a syzkaller program. The reproducer is basically: int s = socket(AF_INET6, SOCK_RAW, NEXTHDR_HOP); send(s, &one_byte_of_data, 1, MSG_MORE); send(s, &more_than_mtu_bytes_data, 2000, 0); The socket() call

Re: [PATCH] ipv6: Need to export ipv6_push_frag_opts for tunneling now.

2017-05-01 Thread Craig Gallek
ler Woops, sorry I missed this. Thanks for the fix! Acked-by: Craig Gallek

[PATCH v2 net-next] ip6_tunnel: Fix missing tunnel encapsulation limit option

2017-04-26 Thread Craig Gallek
From: Craig Gallek The IPv6 tunneling code tries to insert IPV6_TLV_TNL_ENCAP_LIMIT and IPV6_TLV_PADN options when an encapsulation limit is defined (the default is a limit of 4). An MTU adjustment is done to account for these options as well. However, the options are never present in the

Re: [PATCH net-next] ip6_tunnel: Fix missing tunnel encapsulation limit option

2017-04-26 Thread Craig Gallek
On Wed, Apr 26, 2017 at 1:07 PM, Craig Gallek wrote: > From: Craig Gallek > > The IPv6 tunneling code tries to insert IPV6_TLV_TNL_ENCAP_LIMIT and > IPV6_TLV_PADN options when an encapsulation limit is defined (the > default is a limit of 4). An MTU adjustment is done to acco

[PATCH net-next] ip6_tunnel: Fix missing tunnel encapsulation limit option

2017-04-26 Thread Craig Gallek
From: Craig Gallek The IPv6 tunneling code tries to insert IPV6_TLV_TNL_ENCAP_LIMIT and IPV6_TLV_PADN options when an encapsulation limit is defined (the default is a limit of 4). An MTU adjustment is done to account for these options as well. However, the options are never present in the

[PATCH iproute2] gre6: fix copy/paste bugs in GREv6 attribute manipulation

2017-04-21 Thread Craig Gallek
From: Craig Gallek Fixes: af89576d7a8c("iproute2: GRE over IPv6 tunnel support.") Signed-off-by: Craig Gallek --- ip/link_gre6.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/ip/link_gre6.c b/ip/link_gre6.c index a91f635760fa..1b4fb051b37f 100644 --- a/ip/l

[PATCH iproute2] iplink: Expose IFLA_*_FWMARK attributes for supported link types

2017-04-21 Thread Craig Gallek
From: Craig Gallek This attribute allows the administrator to adjust the packet marking attribute of tunnels that support policy based routing. Signed-off-by: Craig Gallek --- include/linux/if_tunnel.h | 3 +++ ip/link_gre.c | 16 ip/link_gre6.c| 24

[PATCH net-next 1/2] ip6_tunnel: Allow policy-based routing through tunnels

2017-04-19 Thread Craig Gallek
From: Craig Gallek This feature allows the administrator to set an fwmark for packets traversing a tunnel. This allows the use of independent routing tables for tunneled packets without the use of iptables. Signed-off-by: Craig Gallek --- include/net/ip6_tunnel.h | 2 ++ include/uapi

[PATCH net-next 2/2] ip_tunnel: Allow policy-based routing through tunnels

2017-04-19 Thread Craig Gallek
From: Craig Gallek This feature allows the administrator to set an fwmark for packets traversing a tunnel. This allows the use of independent routing tables for tunneled packets without the use of iptables. There is no concept of per-packet routing decisions through IPv4 tunnels, so this

[PATCH net-next 0/2] ip_tunnel: Allow policy-based routing through tunnels

2017-04-19 Thread Craig Gallek
From: Craig Gallek iproute2 changes to follow. Example usage: ip link add gre-test type gre local 10.0.0.1 remote 10.0.0.2 fwmark 0x4 ip -detail link show gre-test ... ip link set gre-test type gre fwmark 0 Craig Gallek (2): ip6_tunnel: Allow policy-based routing through tunnels

Re: [PATCH] soreuseport: use "unsigned int" in __reuseport_alloc()

2017-04-03 Thread Craig Gallek
On Sun, Apr 2, 2017 at 6:18 PM, Alexey Dobriyan wrote: > Number of sockets is limited by 16-bit, so 64-bit allocation will never > happen. > > 16-bit ops are the worst code density-wise on x86_64 because of > additional prefix (66). So this boils down to a compiled code density vs a readability/ma

Re: [PATCH 3/5] net/packet: fix overflow in check for tp_frame_nr

2017-03-29 Thread Craig Gallek
On Tue, Mar 28, 2017 at 1:19 PM, Andrey Konovalov wrote: > On Tue, Mar 28, 2017 at 5:54 PM, Craig Gallek wrote: >> On Tue, Mar 28, 2017 at 10:00 AM, Andrey Konovalov >> wrote: >>> When calculating rb->frames_per_block * req->tp_block_nr the result >>>

Re: [PATCH 3/5] net/packet: fix overflow in check for tp_frame_nr

2017-03-28 Thread Craig Gallek
On Tue, Mar 28, 2017 at 10:00 AM, Andrey Konovalov wrote: > When calculating rb->frames_per_block * req->tp_block_nr the result > can overflow. > > Add a check that tp_block_size * tp_block_nr <= UINT_MAX. > > Since frames_per_block <= tp_block_size, the expression would > never overflow. > > Sign

Re: [PATCH 1/6 net-next] inet: collapse ipv4/v6 rcv_saddr_equal functions into one

2017-01-12 Thread Craig Gallek
On Wed, Jan 11, 2017 at 3:19 PM, Josef Bacik wrote: > +int inet_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2, > +bool match_wildcard) > +{ > +#if IS_ENABLED(CONFIG_IPV6) > + if (sk->sk_family == AF_INET6) Still wrapping my head around this, so take it

Re: [PATCH 5/5 net-next] inet: reset tb->fastreuseport when adding a reuseport sk

2016-12-21 Thread Craig Gallek
On Tue, Dec 20, 2016 at 3:07 PM, Josef Bacik wrote: > If we have non reuseport sockets on a tb we will set tb->fastreuseport to 0 > and > never set it again. Which means that in the future if we end up adding a > bunch > of reuseport sk's to that tb we'll have to do the expensive scan every tim

Re: Soft lockup in inet_put_port on 4.6

2016-12-15 Thread Craig Gallek
On Thu, Dec 15, 2016 at 5:39 PM, Tom Herbert wrote: > On Thu, Dec 15, 2016 at 10:53 AM, Josef Bacik wrote: >> On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbert wrote: >>> >>> On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallek >>> wrote: >>>> >>>&g

Re: [PATCH net-next 2/2] inet: Fix get port to handle zero port number with soreuseport set

2016-12-15 Thread Craig Gallek
On Wed, Dec 14, 2016 at 7:54 PM, Tom Herbert wrote: > A user may call listen with binding an explicit port with the intent > that the kernel will assign an available port to the socket. In this > case inet_csk_get_port does a port scan. For such sockets, the user may > also set soreuseport with th

Re: Soft lockup in inet_put_port on 4.6

2016-12-13 Thread Craig Gallek
On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert wrote: > I think there may be some suspicious code in inet_csk_get_port. At > tb_found there is: > > if (((tb->fastreuse > 0 && reuse) || > (tb->fastreuseport > 0 && > !rcu_access_pointer(sk->sk

[PATCH net] inet: Fix missing return value in inet6_hash

2016-10-25 Thread Craig Gallek
From: Craig Gallek As part of a series to implement faster SO_REUSEPORT lookups, commit 086c653f5862 ("sock: struct proto hash function may error") added return values to protocol hash functions and commit 496611d7b5ea ("inet: create IPv6-equivalent inet_hash function") im

Re: [RFC PATCH v2] net: sched: convert qdisc linked list to hashtable

2016-07-07 Thread Craig Gallek
On Thu, Jul 7, 2016 at 4:36 PM, Jiri Kosina wrote: > From: Jiri Kosina > > Convert the per-device linked list into a hashtable. The primary > motivation for this change is that currently, we're not tracking all the > qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup > performed o

[PATCH net-next] tun: Don't assume type tun in tun_device_event

2016-07-06 Thread Craig Gallek
From: Craig Gallek The referenced change added a netlink notifier for processing device queue size events. These events are fired for all devices but the registered callback assumed they only occurred for tun devices. This fix adds a check (borrowed from macvtap.c) to discard non-tun device

Re: [PATCH net-next V4 0/6] switch to use tx skb array in tun

2016-07-06 Thread Craig Gallek
On Thu, Jun 30, 2016 at 2:45 AM, Jason Wang wrote: > Hi all: > > This series tries to switch to use skb array in tun. This is used to > eliminate the spinlock contention between producer and consumer. The > conversion was straightforward: just introdce a tx skb array and use > it instead of sk_rec

Re: [PATCH] soreuseport: add compat case for setsockopt SO_ATTACH_REUSEPORT_CBPF

2016-06-03 Thread Craig Gallek
On Fri, Jun 3, 2016 at 5:09 PM, Helge Deller wrote: > Any idea for a better naming than "do_sockopt_fix_sock_fprog()" ? Thanks for catching and fixing this. I'd suggest simply leaving the function name as-is. Your fix to the condition in that function is sufficient to address the issue. Craig

Re: [PATCH] soreuseport: Fix reuseport_bpf testcase on 32bit architectures

2016-06-03 Thread Craig Gallek
fferent > size [-Wpointer-to-int-cast] > > Signed-off-by: Helge Deller Acked-by: Craig Gallek Thanks!

[PATCH v3 net] soreuseport: Fix TCP listener hash collision

2016-04-28 Thread Craig Gallek
From: Craig Gallek I forgot to include a check for listener port equality when deciding if two sockets should belong to the same reuseport group. This was not caught previously because it's only necessary when two listening sockets for the same user happen to hash to the same listener b

Re: [PATCH v2 net] soreuseport: Fix TCP listener hash collision

2016-04-28 Thread Craig Gallek
On Thu, Apr 28, 2016 at 5:59 PM, Eric Dumazet wrote: > On Thu, 2016-04-28 at 17:07 -0400, Craig Gallek wrote: >> From: Craig Gallek >> >> I forgot to include a check for listener port equality when deciding >> if two sockets should belong to the same reuseport gro

[PATCH v2 net] soreuseport: Fix TCP listener hash collision

2016-04-28 Thread Craig Gallek
From: Craig Gallek I forgot to include a check for listener port equality when deciding if two sockets should belong to the same reuseport group. This was not caught previously because it's only necessary when two listening sockets for the same user happen to hash to the same listener b

[PATCH net] soreuseport: Fix TCP listener hash collision

2016-04-28 Thread Craig Gallek
From: Craig Gallek I forgot to include a check for listener port equality when deciding if two sockets should belong to the same reuseport group. This was not caught previously because it's only necessary when two listening sockets for the same user happen to hash to the same listener b

Re: net merged into net-next

2016-04-25 Thread Craig Gallek
Thanks David, There was one other change that conflicts (functionally) with this merge as well: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood") It did a similar hlist_nulls -> hlist transform for the TCP stack. I'll send a formal patch to address this as well. Craig On S

[PATCH net-next] soreuseport: Resolve merge conflict for v4/v6 ordering fix

2016-04-25 Thread Craig Gallek
From: Craig Gallek d894ba18d4e4 ("soreuseport: fix ordering for mixed v4/v6 sockets") was merged as a bug fix to the net tree. Two conflicting changes were committed to net-next before the above fix was merged back to net-next: ca065d0cf80f ("udp: no longer use SLAB

[RFC net-next] soreuseport: fix ordering for mixed v4/v6 sockets

2016-04-15 Thread Craig Gallek
From: Craig Gallek With the SO_REUSEPORT socket option, it is possible to create sockets in the AF_INET and AF_INET6 domains which are bound to the same IPv4 address. This is only possible with SO_REUSEPORT and when not using IPV6_V6ONLY on the AF_INET6 sockets. Prior to the commits referenced

[PATCH net 2/2] soreuseport: test mixed v4/v6 sockets

2016-04-12 Thread Craig Gallek
From: Craig Gallek Test to validate the behavior of SO_REUSEPORT sockets that are created with both AF_INET and AF_INET6. See the commit prior to this for a description of this behavior. Signed-off-by: Craig Gallek --- tools/testing/selftests/net/.gitignore| 1 + tools/testing

[PATCH net 0/2] Fixes for SO_REUSEPORT and mixed v4/v6 sockets

2016-04-12 Thread Craig Gallek
From: Craig Gallek Recent changes to the datastructures associated with SO_REUSEPORT broke an existing behavior when equivalent SO_REUSEPORT sockets are created using both AF_INET and AF_INET6. This patch series restores the previous behavior and includes a test to validate it. This series

[PATCH net 1/2] soreuseport: fix ordering for mixed v4/v6 sockets

2016-04-12 Thread Craig Gallek
From: Craig Gallek With the SO_REUSEPORT socket option, it is possible to create sockets in the AF_INET and AF_INET6 domains which are bound to the same IPv4 address. This is only possible with SO_REUSEPORT and when not using IPV6_V6ONLY on the AF_INET6 sockets. Prior to the commits referenced

Re: [PATCH 1/1] net: Add SO_REUSEPORT_LISTEN_OFF socket option as drain mode

2016-03-25 Thread Craig Gallek
On Fri, Mar 25, 2016 at 12:21 PM, Alexei Starovoitov wrote: > On Fri, Mar 25, 2016 at 11:29:10AM -0400, Craig Gallek wrote: >> On Thu, Mar 24, 2016 at 2:00 PM, Willy Tarreau wrote: >> > The pattern is : >> > >> > t0 : unprivileged processes 1

Re: [PATCH 1/1] net: Add SO_REUSEPORT_LISTEN_OFF socket option as drain mode

2016-03-25 Thread Craig Gallek
On Thu, Mar 24, 2016 at 2:00 PM, Willy Tarreau wrote: > The pattern is : > > t0 : unprivileged processes 1 and 2 are listening to the same port >(sock1@pid1) (sock2@pid2) ><-- listening --> > > t1 : new processes are started to replace the old ones >(sock1@pid1)

Re: [PATCH v2] socket.7: Document some BPF-related socket options

2016-03-01 Thread Craig Gallek
On Tue, Mar 1, 2016 at 5:29 AM, Michael Kerrisk (man-pages) wrote: > On 03/01/2016 11:10 AM, Vincent Bernat wrote: >> ❦ 1 mars 2016 11:03 +0100, "Michael Kerrisk (man-pages)" >> : >> >>> Once the SO_LOCK_FILTER option has been enabled, >>> attempts by an unprivilege

[PATCH v2] socket.7: Document some BPF-related socket options

2016-02-29 Thread Craig Gallek
From: Craig Gallek Document the behavior and the first kernel version for each of the following socket options: SO_ATTACH_FILTER SO_ATTACH_BPF SO_ATTACH_REUSEPORT_CBPF SO_ATTACH_REUSEPORT_EBPF SO_DETACH_FILTER SO_DETACH_BPF SO_LOCK_FILTER Signed-off-by: Craig Gallek --- v2 changes: - Content

[PATCH] socket.7: Document some BPF-related socket options

2016-02-25 Thread Craig Gallek
From: Craig Gallek Document the behavior and the first kernel version for each of the following socket options: SO_ATTACH_FILTER SO_ATTACH_BPF SO_ATTACH_REUSEPORT_CBPF SO_ATTACH_REUSEPORT_EBPF SO_DETACH_FILTER SO_DETACH_BPF Signed-off-by: Craig Gallek --- man7/socket.7 | 104

[PATCH net-next] soreuseport: fix merge conflict in tcp bind

2016-02-22 Thread Craig Gallek
From: Craig Gallek One of the validation checks for the new array-based TCP SO_REUSEPORT validation was unintentionally dropped in ea8add2b1903. This adds it back. Lack of this check allows the user to allocate multiple sock_reuseport structures (leaking all but the first). Fixes

[PATCH net-next v4 5/7] soreuseport: Prep for fast reuseport TCP socket selection

2016-02-10 Thread Craig Gallek
From: Craig Gallek Both of the lines in this patch probably should have been included in the initial implementation of this code for generic socket support, but weren't technically necessary since only UDP sockets were supported. First, the sk_reuseport_cb points to a structure which as

[PATCH net-next v4 6/7] soreuseport: fast reuseport TCP socket selection

2016-02-10 Thread Craig Gallek
From: Craig Gallek This change extends the fast SO_REUSEPORT socket lookup implemented for UDP to TCP. Listener sockets with SO_REUSEPORT and the same receive address are additionally added to an array for faster random access. This means that only a single socket from the group must be found

[PATCH net-next v4 7/7] soreuseport: BPF selection functional test for TCP

2016-02-10 Thread Craig Gallek
From: Craig Gallek Unfortunately the existing test relied on packet payload in order to map incoming packets to sockets. In order to get this to work with TCP, TCP_FASTOPEN needed to be used. Since the fast open path is slightly different than the standard TCP path, I created a second test

[PATCH net-next v4 0/7] Faster SO_REUSEPORT for TCP

2016-02-10 Thread Craig Gallek
From: Craig Gallek This patch series complements an earlier series (6a5ef90c58da) which added faster SO_REUSEPORT lookup for UDP sockets by extending the feature to TCP sockets. It uses the same array-based data structure which allows for socket selection after finding the first listening

[PATCH net-next v4 1/7] sock: struct proto hash function may error

2016-02-10 Thread Craig Gallek
From: Craig Gallek In order to support fast reuseport lookups in TCP, the hash function defined in struct proto must be capable of returning an error code. This patch changes the function signature of all related hash functions to return an integer and handles or propagates this return value at

[PATCH net-next v4 4/7] inet: refactor inet[6]_lookup functions to take skb

2016-02-10 Thread Craig Gallek
From: Craig Gallek This is a preliminary step to allow fast socket lookup of SO_REUSEPORT groups. Doing so with a BPF filter will require access to the skb in question. This change plumbs the skb (and offset to payload data) through the call stack to the listening socket lookup implementations

[PATCH net-next v4 3/7] tcp: __tcp_hdrlen() helper

2016-02-10 Thread Craig Gallek
From: Craig Gallek tcp_hdrlen is wasteful if you already have a pointer to struct tcphdr. This splits the size calculation into a helper function that can be used if a struct tcphdr is already available. Signed-off-by: Craig Gallek --- include/linux/tcp.h | 7 ++- 1 file changed, 6

[PATCH net-next v4 2/7] inet: create IPv6-equivalent inet_hash function

2016-02-10 Thread Craig Gallek
From: Craig Gallek In order to support fast lookups for TCP sockets with SO_REUSEPORT, the function that adds sockets to the listening hash set needs to be able to check receive address equality. Since this equality check is different for IPv4 and IPv6, we will need two different socket hashing

[PATCH net-next v3 6/7] soreuseport: fast reuseport TCP socket selection

2016-02-09 Thread Craig Gallek
From: Craig Gallek This change extends the fast SO_REUSEPORT socket lookup implemented for UDP to TCP. Listener sockets with SO_REUSEPORT and the same receive address are additionally added to an array for faster random access. This means that only a single socket from the group must be found

[PATCH net-next v3 3/7] tcp: __tcp_hdrlen() helper

2016-02-09 Thread Craig Gallek
From: Craig Gallek tcp_hdrlen is wasteful if you already have a pointer to struct tcphdr. This splits the size calculation into a helper function that can be used if a struct tcphdr is already available. Signed-off-by: Craig Gallek --- include/linux/tcp.h | 7 ++- 1 file changed, 6

[PATCH net-next v3 4/7] inet: refactor inet[6]_lookup functions to take skb

2016-02-09 Thread Craig Gallek
From: Craig Gallek This is a preliminary step to allow fast socket lookup of SO_REUSEPORT groups. Doing so with a BPF filter will require access to the skb in question. This change plumbs the skb (and offset to payload data) through the call stack to the listening socket lookup implementations

  1   2   >