nal cost for the BPF case and just a single branch
for the unconnected udp, tcp listener case!
Acked-by: Craig Gallek
Pv4, which has passed a NULL skb pointer to
> reuseport_select_sock().
>
> Fixes: 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
> Cc: Craig Gallek
> Signed-off-by: Martin KaFai Lau
Acked-by: Craig Gallek
udo ./tools/testing/selftests/bpf/test_lpm_map
> test_lpm_map: test_lpm_map.c:485: test_lpm_delete: Assertion
> `bpf_map_delete_elem(map_fd, key) == -1 && errno == ENOENT' failed.
> Aborted
>
> With the patch: test_lpm_map runs without errors.
>
> Fixes: e454
count:0 mapping:88018a02c140 index:0x0
> compound_mapcount: 0
> flags: 0x2fffc008100(slab|head)
> raw: 02fffc0000008100 88018a02c140 00010001
> raw: ea00062a1320 ea0006268020 8801d9bdde40
> page dumped because: kasan: bad access detected
>
> Fixes: b922622ec6ef ("sock_diag: don't broadcast kernel sockets")
> Signed-off-by: Eric Dumazet
> Cc: Craig Gallek
> Reported-by: syzbot
Acked-by: Craig Gallek
Thanks Eric!
> Signed-off-by: Eric Dumazet
> Reported-by: syzbot+c0ea2226f77a42936...@syzkaller.appspotmail.com
Clever fix, thanks Eric(s)!
Acked-by: Craig Gallek
2ea7e74727 ("soreuseport: fast reuseport UDP socket selection")
> Signed-off-by: Martin KaFai Lau
Wow, good catch!
Acked-by: Craig Gallek
On Sat, Dec 23, 2017 at 5:12 PM, Nicolas Dichtel
wrote:
> Le 22/12/2017 à 21:36, Craig Gallek a écrit :
>> From: Craig Gallek
>> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
>> index 60a71be75aea..4b7ea33f5705 100644
>> --- a/net/core/net_na
From: Craig Gallek
netns ids were added in commit 0c7aecd4bde4 and defined as signed
integers in both the kernel datastructures and the netlink interface.
However, the semantics of the implementation assume that the ids
are always greater than or equal to zero, except for an internal
sentinal
On Fri, Dec 22, 2017 at 8:59 AM, Craig Gallek wrote:
> On Fri, Dec 22, 2017 at 3:11 AM, Nicolas Dichtel
> wrote:
>> Le 21/12/2017 à 23:18, Craig Gallek a écrit :
>>> From: Craig Gallek
>>>
>>> The below referenced commit extended the RTM_GETLINK interface
On Fri, Dec 22, 2017 at 3:11 AM, Nicolas Dichtel
wrote:
> Le 21/12/2017 à 23:18, Craig Gallek a écrit :
>> From: Craig Gallek
>>
>> The below referenced commit extended the RTM_GETLINK interface to
>> allow querying by netns id. The netnsid property was previously
>
From: Craig Gallek
The below referenced commit extended the RTM_GETLINK interface to
allow querying by netns id. The netnsid property was previously
defined as a signed integer, but this patch assumes that the user
always passes a positive integer. syzkaller discovered this problem
by setting
On Tue, Dec 12, 2017 at 8:09 AM, Paolo Abeni wrote:
> When a reuseport socket group is using a BPF filter to distribute
> the packets among the sockets, we don't need to compute any hash
> value, but the current reuseport_select_sock() requires the
> caller to compute such hash in advance.
>
> Thi
On Tue, Dec 5, 2017 at 3:07 PM, Eric Dumazet wrote:
> On Tue, 2017-12-05 at 14:39 -0500, Craig Gallek wrote:
>> On Tue, Dec 5, 2017 at 9:18 AM, Eric Dumazet
>> wrote:
>> > On Tue, 2017-12-05 at 06:15 -0800, Eric Dumazet wrote:
>> > >
>> > > + h
On Tue, Dec 5, 2017 at 9:18 AM, Eric Dumazet wrote:
> On Tue, 2017-12-05 at 06:15 -0800, Eric Dumazet wrote:
>>
>> + hlist_nulls_add_head_rcu(&sk->sk_nulss_node, list);
>
> Typo here, this needs sk_nulls_node of course.
>
Thanks Eric, this looks good to me. The tail insertion is still
requir
ody, so that we can drop some duplicate
> code in the ipv4 and ipv6 stack.
>
> This also allows faster lookup in the above scenario and will allow
> us to avoid computing the hash value for successful, BPF based
> demultiplexing - in a later patch.
>
> Signed-off-by: Paolo Aben
From: Craig Gallek
do_check() can fail early without allocating env->cur_state under
memory pressure. Syzkaller found the stack below on the linux-next
tree because of this.
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
gene
On Thu, Nov 2, 2017 at 11:07 AM, Alexei Starovoitov wrote:
> On 11/2/17 7:21 AM, Craig Gallek wrote:
>>
>> From: Craig Gallek
>>
>> do_check() can fail early without allocating env->cur_state under
>> memory pressure. Syzkaller found the stack below on th
From: Craig Gallek
do_check() can fail early without allocating env->cur_state under
memory pressure. Syzkaller found the stack below on the linux-next
tree because of this.
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
gene
From: Craig Gallek
Syzkaller found several variants of the lockup below by setting negative
values with the TUNSETSNDBUF ioctl. This patch adds a sanity check
to both the tun and tap versions of this ioctl.
watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [repro:2389]
Modules linked in
From: Craig Gallek
Syzkaller stumbled upon a way to trigger
WARNING: CPU: 1 PID: 13881 at net/core/sock_reuseport.c:41
reuseport_alloc+0x306/0x3b0 net/core/sock_reuseport.c:39
There are two initialization paths for the sock_reuseport structure in a
socket: Through the udp/tcp bind paths of
From: Craig Gallek
This library previously assumed a fixed-size map options structure.
Any new options were ignored. In order to allow the options structure
to grow and to support parsing older programs, this patch updates
the maps section parsing to handle varying sizes.
Object files with
From: Craig Gallek
The functional change to this series is the ability to use flags when
creating maps from object files loaded by libbpf. In order to do this,
the first patch updates the library to handle map definitions that
differ in size from libbpf's struct bpf_map_def.
For object
From: Craig Gallek
This is required to use BPF_MAP_TYPE_LPM_TRIE or any other map type
which requires flags.
Signed-off-by: Craig Gallek
---
tools/lib/bpf/libbpf.c | 2 +-
tools/lib/bpf/libbpf.h | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/lib/bpf/libbpf.c b
On Tue, Oct 3, 2017 at 10:39 AM, Daniel Borkmann wrote:
> On 10/03/2017 01:07 AM, Alexei Starovoitov wrote:
>>
>> On 10/2/17 9:41 AM, Craig Gallek wrote:
>>>
>>> +/* Assume equally sized map definitions */
>>> +map_def_sz = data->d_size /
On Tue, Oct 3, 2017 at 10:11 AM, Jesper Dangaard Brouer
wrote:
> On Mon, 2 Oct 2017 12:41:28 -0400
> Craig Gallek wrote:
>
>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>> index 4f402dcdf372..28b300868ad7 100644
>> --- a/tools/lib/bpf/libbpf.c
>
On Tue, Oct 3, 2017 at 10:11 AM, Jesper Dangaard Brouer
wrote:
> On Mon, 2 Oct 2017 12:41:28 -0400
> Craig Gallek wrote:
>
>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>> index 4f402dcdf372..28b300868ad7 100644
>> --- a/tools/lib/bpf/libbpf.c
>
On Tue, Oct 3, 2017 at 10:03 AM, Jesper Dangaard Brouer
wrote:
>
>
> First of all, thank you Craig for working on this. As Alexei says, we
> need to improve tools/lib/bpf/libbpf and move towards converting users
> of bpf_load.c to this lib instead.
>
> Comments inlined below.
>
>> +
From: Craig Gallek
This is required to use BPF_MAP_TYPE_LPM_TRIE or any other map type
which requires flags.
Signed-off-by: Craig Gallek
---
tools/lib/bpf/libbpf.c | 2 +-
tools/lib/bpf/libbpf.h | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/lib/bpf/libbpf.c b
From: Craig Gallek
This library previously assumed a fixed-size map options structure.
Any new options were ignored. In order to allow the options structure
to grow and to support parsing older programs, this patch updates
the maps section parsing to handle varying sizes.
Object files with
From: Craig Gallek
The functional change to this series is the ability to use flags when
creating maps from object files loaded by libbpf. In order to do this,
the first patch updates the library to handle map definitions that
differ in size from libbpf's struct bpf_map_def.
For object
On Wed, Sep 27, 2017 at 6:03 PM, Daniel Borkmann wrote:
> On 09/27/2017 06:29 PM, Alexei Starovoitov wrote:
>>
>> On 9/27/17 7:04 AM, Craig Gallek wrote:
>>>
>>> From: Craig Gallek
>>>
>>> This extends struct bpf_map_def to include a flags fi
From: Craig Gallek
This extends struct bpf_map_def to include a flags field. Note that
this has the potential to break the validation logic in
bpf_object__validate_maps and bpf_object__init_maps as they use
sizeof(struct bpf_map_def) as a minimal allowable size of a map section.
Any bpf program
From: Craig Gallek
Before the delete operator was added, this datastructure maintained
an invariant that intermediate nodes were only present when necessary
to build the tree. This patch updates the delete operation to reinstate
that invariant by removing unnecessary intermediate nodes after a
On Wed, Sep 20, 2017 at 6:56 PM, Daniel Mack wrote:
> On 09/20/2017 08:51 PM, Craig Gallek wrote:
>> On Wed, Sep 20, 2017 at 12:51 PM, Daniel Mack wrote:
>>> Hi Craig,
>>>
>>> Thanks, this looks much cleaner already :)
>>>
>>> On 09/20/2017
On Wed, Sep 20, 2017 at 12:51 PM, Daniel Mack wrote:
> Hi Craig,
>
> Thanks, this looks much cleaner already :)
>
> On 09/20/2017 06:22 PM, Craig Gallek wrote:
>> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
>> index 9d58a576b2ae..b5a7d70ec8b5 100644
>
From: Craig Gallek
Before the delete operator was added, this datastructure maintained
an invariant that intermediate nodes were only present when necessary
to build the tree. This patch updates the delete operation to reinstate
that invariant by removing unnecessary intermediate nodes after a
On Tue, Sep 19, 2017 at 5:13 PM, Daniel Mack wrote:
> On 09/19/2017 10:55 PM, David Miller wrote:
>> From: Craig Gallek
>> Date: Mon, 18 Sep 2017 15:30:54 -0400
>>
>>> This was previously left as a TODO. Add the implementation and
>>> extend the test to
On Mon, Sep 18, 2017 at 6:53 PM, Alexei Starovoitov wrote:
Thanks for the review! Please correct me if I'm wrong...
> On 9/18/17 12:30 PM, Craig Gallek wrote:
>>
>> From: Craig Gallek
>>
>> This is a simple non-recursive delete operation. It prunes paths
>&
From: Craig Gallek
Extend the 'random' operation tests to include a delete operation
(delete half of the nodes from both lpm implementions and ensure
that lookups are still equivalent).
Also, add a simple IPv4 test which verifies lookup behavior as nodes
are deleted from the tree.
From: Craig Gallek
This was previously left as a TODO. Add the implementation and
extend the test to cover it.
Craig Gallek (3):
bpf: Implement map_delete_elem for BPF_MAP_TYPE_LPM_TRIE
bpf: Add uniqueness invariant to trivial lpm test implementation
bpf: Test deletion in
From: Craig Gallek
The 'trivial' lpm implementation in this test allows equivalent nodes
to be added (that is, nodes consisting of the same prefix and prefix
length). For lookup operations, this is fine because insertion happens
at the head of the (singly linked) list and the first,
From: Craig Gallek
This is a simple non-recursive delete operation. It prunes paths
of empty nodes in the tree, but it does not try to further compress
the tree as nodes are removed.
Signed-off-by: Craig Gallek
---
kernel/bpf/lpm_trie.c | 80
From: Craig Gallek
A recent change to fix up DSA device behavior made the assumption that
all skbs passing through the flow disector will be associated with a
device. This does not appear to be a safe assumption. Syzkaller found
the crash below by attaching a BPF socket filter that tries to
On Wed, Jun 21, 2017 at 12:51 PM, Lawrence Brakmo wrote:
>
> On 6/20/17, 2:25 PM, "Craig Gallek" wrote:
>
> On Mon, Jun 19, 2017 at 11:00 PM, Lawrence Brakmo wrote:
> > Added support for calling a subset of socket setsockopts from
> > BPF_PROG_TY
On Mon, Jun 19, 2017 at 11:00 PM, Lawrence Brakmo wrote:
> Added support for calling a subset of socket setsockopts from
> BPF_PROG_TYPE_SOCK_OPS programs. The code was duplicated rather
> than making the changes to call the socket setsockopt function because
> the changes required would have been
On Fri, Jun 2, 2017 at 2:25 PM, Craig Gallek wrote:
> On Fri, Jun 2, 2017 at 2:05 PM, David Miller wrote:
>> From: Ben Hutchings
>> Date: Wed, 31 May 2017 13:26:02 +0100
>>
>>> If I'm not mistaken, ipv6_gso_segment() now leaks segs if
>>> ip6_find_1
On Fri, Jun 2, 2017 at 2:05 PM, David Miller wrote:
> From: Ben Hutchings
> Date: Wed, 31 May 2017 13:26:02 +0100
>
>> If I'm not mistaken, ipv6_gso_segment() now leaks segs if
>> ip6_find_1stfragopt() fails. I'm not sure whether the fix would be as
>> simple as adding a kfree_skb(segs) or wheth
implementations to the original
ip6_find_1stfragopt and may very well suffer from the same bug I was
trying to fix. Maybe it doesn't matter since that bug relied on the
user changing the v6 nexthdr field. I need to understand the mip6
code first...
In any event, I think this patch applies on its own. Thanks again.
Acked-by: Craig Gallek
On Wed, May 17, 2017 at 10:58 PM, David Miller wrote:
> From: Julia Lawall
> Date: Thu, 18 May 2017 10:01:07 +0800 (SGT)
>
>> It may be worth checking on these. The code context is shown in the first
>> case (line 120). For the others, at least it gives the line numbers.
> ...
net/ipv6/ip
From: Craig Gallek
The KASAN warning repoted below was discovered with a syzkaller
program. The reproducer is basically:
int s = socket(AF_INET6, SOCK_RAW, NEXTHDR_HOP);
send(s, &one_byte_of_data, 1, MSG_MORE);
send(s, &more_than_mtu_bytes_data, 2000, 0);
The socket() call
ler
Woops, sorry I missed this. Thanks for the fix!
Acked-by: Craig Gallek
From: Craig Gallek
The IPv6 tunneling code tries to insert IPV6_TLV_TNL_ENCAP_LIMIT and
IPV6_TLV_PADN options when an encapsulation limit is defined (the
default is a limit of 4). An MTU adjustment is done to account for
these options as well. However, the options are never present in the
On Wed, Apr 26, 2017 at 1:07 PM, Craig Gallek wrote:
> From: Craig Gallek
>
> The IPv6 tunneling code tries to insert IPV6_TLV_TNL_ENCAP_LIMIT and
> IPV6_TLV_PADN options when an encapsulation limit is defined (the
> default is a limit of 4). An MTU adjustment is done to acco
From: Craig Gallek
The IPv6 tunneling code tries to insert IPV6_TLV_TNL_ENCAP_LIMIT and
IPV6_TLV_PADN options when an encapsulation limit is defined (the
default is a limit of 4). An MTU adjustment is done to account for
these options as well. However, the options are never present in the
From: Craig Gallek
Fixes: af89576d7a8c("iproute2: GRE over IPv6 tunnel support.")
Signed-off-by: Craig Gallek
---
ip/link_gre6.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/ip/link_gre6.c b/ip/link_gre6.c
index a91f635760fa..1b4fb051b37f 100644
--- a/ip/l
From: Craig Gallek
This attribute allows the administrator to adjust the packet marking
attribute of tunnels that support policy based routing.
Signed-off-by: Craig Gallek
---
include/linux/if_tunnel.h | 3 +++
ip/link_gre.c | 16
ip/link_gre6.c| 24
From: Craig Gallek
This feature allows the administrator to set an fwmark for
packets traversing a tunnel. This allows the use of independent
routing tables for tunneled packets without the use of iptables.
Signed-off-by: Craig Gallek
---
include/net/ip6_tunnel.h | 2 ++
include/uapi
From: Craig Gallek
This feature allows the administrator to set an fwmark for
packets traversing a tunnel. This allows the use of independent
routing tables for tunneled packets without the use of iptables.
There is no concept of per-packet routing decisions through IPv4
tunnels, so this
From: Craig Gallek
iproute2 changes to follow. Example usage:
ip link add gre-test type gre local 10.0.0.1 remote 10.0.0.2 fwmark 0x4
ip -detail link show gre-test
...
ip link set gre-test type gre fwmark 0
Craig Gallek (2):
ip6_tunnel: Allow policy-based routing through tunnels
On Sun, Apr 2, 2017 at 6:18 PM, Alexey Dobriyan wrote:
> Number of sockets is limited by 16-bit, so 64-bit allocation will never
> happen.
>
> 16-bit ops are the worst code density-wise on x86_64 because of
> additional prefix (66).
So this boils down to a compiled code density vs a
readability/ma
On Tue, Mar 28, 2017 at 1:19 PM, Andrey Konovalov wrote:
> On Tue, Mar 28, 2017 at 5:54 PM, Craig Gallek wrote:
>> On Tue, Mar 28, 2017 at 10:00 AM, Andrey Konovalov
>> wrote:
>>> When calculating rb->frames_per_block * req->tp_block_nr the result
>>>
On Tue, Mar 28, 2017 at 10:00 AM, Andrey Konovalov
wrote:
> When calculating rb->frames_per_block * req->tp_block_nr the result
> can overflow.
>
> Add a check that tp_block_size * tp_block_nr <= UINT_MAX.
>
> Since frames_per_block <= tp_block_size, the expression would
> never overflow.
>
> Sign
On Wed, Jan 11, 2017 at 3:19 PM, Josef Bacik wrote:
> +int inet_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2,
> +bool match_wildcard)
> +{
> +#if IS_ENABLED(CONFIG_IPV6)
> + if (sk->sk_family == AF_INET6)
Still wrapping my head around this, so take it
On Tue, Dec 20, 2016 at 3:07 PM, Josef Bacik wrote:
> If we have non reuseport sockets on a tb we will set tb->fastreuseport to 0
> and
> never set it again. Which means that in the future if we end up adding a
> bunch
> of reuseport sk's to that tb we'll have to do the expensive scan every tim
On Thu, Dec 15, 2016 at 5:39 PM, Tom Herbert wrote:
> On Thu, Dec 15, 2016 at 10:53 AM, Josef Bacik wrote:
>> On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbert wrote:
>>>
>>> On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallek
>>> wrote:
>>>>
>>>&g
On Wed, Dec 14, 2016 at 7:54 PM, Tom Herbert wrote:
> A user may call listen with binding an explicit port with the intent
> that the kernel will assign an available port to the socket. In this
> case inet_csk_get_port does a port scan. For such sockets, the user may
> also set soreuseport with th
On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert wrote:
> I think there may be some suspicious code in inet_csk_get_port. At
> tb_found there is:
>
> if (((tb->fastreuse > 0 && reuse) ||
> (tb->fastreuseport > 0 &&
> !rcu_access_pointer(sk->sk
From: Craig Gallek
As part of a series to implement faster SO_REUSEPORT lookups,
commit 086c653f5862 ("sock: struct proto hash function may error")
added return values to protocol hash functions and
commit 496611d7b5ea ("inet: create IPv6-equivalent inet_hash function")
im
On Thu, Jul 7, 2016 at 4:36 PM, Jiri Kosina wrote:
> From: Jiri Kosina
>
> Convert the per-device linked list into a hashtable. The primary
> motivation for this change is that currently, we're not tracking all the
> qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup
> performed o
From: Craig Gallek
The referenced change added a netlink notifier for processing
device queue size events. These events are fired for all devices
but the registered callback assumed they only occurred for tun
devices. This fix adds a check (borrowed from macvtap.c) to discard
non-tun device
On Thu, Jun 30, 2016 at 2:45 AM, Jason Wang wrote:
> Hi all:
>
> This series tries to switch to use skb array in tun. This is used to
> eliminate the spinlock contention between producer and consumer. The
> conversion was straightforward: just introdce a tx skb array and use
> it instead of sk_rec
On Fri, Jun 3, 2016 at 5:09 PM, Helge Deller wrote:
> Any idea for a better naming than "do_sockopt_fix_sock_fprog()" ?
Thanks for catching and fixing this. I'd suggest simply leaving the
function name as-is. Your fix to the condition in that function is
sufficient to address the issue.
Craig
fferent
> size [-Wpointer-to-int-cast]
>
> Signed-off-by: Helge Deller
Acked-by: Craig Gallek
Thanks!
From: Craig Gallek
I forgot to include a check for listener port equality when deciding
if two sockets should belong to the same reuseport group. This was
not caught previously because it's only necessary when two listening
sockets for the same user happen to hash to the same listener b
On Thu, Apr 28, 2016 at 5:59 PM, Eric Dumazet wrote:
> On Thu, 2016-04-28 at 17:07 -0400, Craig Gallek wrote:
>> From: Craig Gallek
>>
>> I forgot to include a check for listener port equality when deciding
>> if two sockets should belong to the same reuseport gro
From: Craig Gallek
I forgot to include a check for listener port equality when deciding
if two sockets should belong to the same reuseport group. This was
not caught previously because it's only necessary when two listening
sockets for the same user happen to hash to the same listener b
From: Craig Gallek
I forgot to include a check for listener port equality when deciding
if two sockets should belong to the same reuseport group. This was
not caught previously because it's only necessary when two listening
sockets for the same user happen to hash to the same listener b
Thanks David,
There was one other change that conflicts (functionally) with this
merge as well: 3b24d854cb35 ("tcp/dccp: do not touch listener
sk_refcnt under synflood")
It did a similar hlist_nulls -> hlist transform for the TCP stack.
I'll send a formal patch to address this as well.
Craig
On S
From: Craig Gallek
d894ba18d4e4 ("soreuseport: fix ordering for mixed v4/v6 sockets")
was merged as a bug fix to the net tree. Two conflicting changes
were committed to net-next before the above fix was merged back to
net-next:
ca065d0cf80f ("udp: no longer use SLAB
From: Craig Gallek
With the SO_REUSEPORT socket option, it is possible to create sockets
in the AF_INET and AF_INET6 domains which are bound to the same IPv4 address.
This is only possible with SO_REUSEPORT and when not using IPV6_V6ONLY on
the AF_INET6 sockets.
Prior to the commits referenced
From: Craig Gallek
Test to validate the behavior of SO_REUSEPORT sockets that are
created with both AF_INET and AF_INET6. See the commit prior to this
for a description of this behavior.
Signed-off-by: Craig Gallek
---
tools/testing/selftests/net/.gitignore| 1 +
tools/testing
From: Craig Gallek
Recent changes to the datastructures associated with SO_REUSEPORT broke
an existing behavior when equivalent SO_REUSEPORT sockets are created
using both AF_INET and AF_INET6. This patch series restores the previous
behavior and includes a test to validate it.
This series
From: Craig Gallek
With the SO_REUSEPORT socket option, it is possible to create sockets
in the AF_INET and AF_INET6 domains which are bound to the same IPv4 address.
This is only possible with SO_REUSEPORT and when not using IPV6_V6ONLY on
the AF_INET6 sockets.
Prior to the commits referenced
On Fri, Mar 25, 2016 at 12:21 PM, Alexei Starovoitov
wrote:
> On Fri, Mar 25, 2016 at 11:29:10AM -0400, Craig Gallek wrote:
>> On Thu, Mar 24, 2016 at 2:00 PM, Willy Tarreau wrote:
>> > The pattern is :
>> >
>> > t0 : unprivileged processes 1
On Thu, Mar 24, 2016 at 2:00 PM, Willy Tarreau wrote:
> The pattern is :
>
> t0 : unprivileged processes 1 and 2 are listening to the same port
>(sock1@pid1) (sock2@pid2)
><-- listening -->
>
> t1 : new processes are started to replace the old ones
>(sock1@pid1)
On Tue, Mar 1, 2016 at 5:29 AM, Michael Kerrisk (man-pages)
wrote:
> On 03/01/2016 11:10 AM, Vincent Bernat wrote:
>> ❦ 1 mars 2016 11:03 +0100, "Michael Kerrisk (man-pages)"
>> :
>>
>>> Once the SO_LOCK_FILTER option has been enabled,
>>> attempts by an unprivilege
From: Craig Gallek
Document the behavior and the first kernel version for each of the
following socket options:
SO_ATTACH_FILTER
SO_ATTACH_BPF
SO_ATTACH_REUSEPORT_CBPF
SO_ATTACH_REUSEPORT_EBPF
SO_DETACH_FILTER
SO_DETACH_BPF
SO_LOCK_FILTER
Signed-off-by: Craig Gallek
---
v2 changes:
- Content
From: Craig Gallek
Document the behavior and the first kernel version for each of the
following socket options:
SO_ATTACH_FILTER
SO_ATTACH_BPF
SO_ATTACH_REUSEPORT_CBPF
SO_ATTACH_REUSEPORT_EBPF
SO_DETACH_FILTER
SO_DETACH_BPF
Signed-off-by: Craig Gallek
---
man7/socket.7 | 104
From: Craig Gallek
One of the validation checks for the new array-based TCP SO_REUSEPORT
validation was unintentionally dropped in ea8add2b1903. This adds it back.
Lack of this check allows the user to allocate multiple sock_reuseport
structures (leaking all but the first).
Fixes
From: Craig Gallek
Both of the lines in this patch probably should have been included
in the initial implementation of this code for generic socket
support, but weren't technically necessary since only UDP sockets
were supported.
First, the sk_reuseport_cb points to a structure which as
From: Craig Gallek
This change extends the fast SO_REUSEPORT socket lookup implemented
for UDP to TCP. Listener sockets with SO_REUSEPORT and the same
receive address are additionally added to an array for faster
random access. This means that only a single socket from the group
must be found
From: Craig Gallek
Unfortunately the existing test relied on packet payload in order to
map incoming packets to sockets. In order to get this to work with TCP,
TCP_FASTOPEN needed to be used.
Since the fast open path is slightly different than the standard TCP path,
I created a second test
From: Craig Gallek
This patch series complements an earlier series (6a5ef90c58da)
which added faster SO_REUSEPORT lookup for UDP sockets by
extending the feature to TCP sockets. It uses the same
array-based data structure which allows for socket selection
after finding the first listening
From: Craig Gallek
In order to support fast reuseport lookups in TCP, the hash function
defined in struct proto must be capable of returning an error code.
This patch changes the function signature of all related hash functions
to return an integer and handles or propagates this return value at
From: Craig Gallek
This is a preliminary step to allow fast socket lookup of SO_REUSEPORT
groups. Doing so with a BPF filter will require access to the
skb in question. This change plumbs the skb (and offset to payload
data) through the call stack to the listening socket lookup
implementations
From: Craig Gallek
tcp_hdrlen is wasteful if you already have a pointer to struct tcphdr.
This splits the size calculation into a helper function that can be
used if a struct tcphdr is already available.
Signed-off-by: Craig Gallek
---
include/linux/tcp.h | 7 ++-
1 file changed, 6
From: Craig Gallek
In order to support fast lookups for TCP sockets with SO_REUSEPORT,
the function that adds sockets to the listening hash set needs
to be able to check receive address equality. Since this equality
check is different for IPv4 and IPv6, we will need two different
socket hashing
From: Craig Gallek
This change extends the fast SO_REUSEPORT socket lookup implemented
for UDP to TCP. Listener sockets with SO_REUSEPORT and the same
receive address are additionally added to an array for faster
random access. This means that only a single socket from the group
must be found
From: Craig Gallek
tcp_hdrlen is wasteful if you already have a pointer to struct tcphdr.
This splits the size calculation into a helper function that can be
used if a struct tcphdr is already available.
Signed-off-by: Craig Gallek
---
include/linux/tcp.h | 7 ++-
1 file changed, 6
From: Craig Gallek
This is a preliminary step to allow fast socket lookup of SO_REUSEPORT
groups. Doing so with a BPF filter will require access to the
skb in question. This change plumbs the skb (and offset to payload
data) through the call stack to the listening socket lookup
implementations
1 - 100 of 179 matches
Mail list logo