On 10 May 2018 at 22:00, Martin KaFai Lau <ka...@fb.com> wrote: > On Wed, May 09, 2018 at 02:07:05PM -0700, Joe Stringer wrote: >> This patch adds a new BPF helper function, sk_lookup() which allows BPF >> programs to find out if there is a socket listening on this host, and >> returns a socket pointer which the BPF program can then access to >> determine, for instance, whether to forward or drop traffic. sk_lookup() >> takes a reference on the socket, so when a BPF program makes use of this >> function, it must subsequently pass the returned pointer into the newly >> added sk_release() to return the reference. >> >> By way of example, the following pseudocode would filter inbound >> connections at XDP if there is no corresponding service listening for >> the traffic: >> >> struct bpf_sock_tuple tuple; >> struct bpf_sock_ops *sk; >> >> populate_tuple(ctx, &tuple); // Extract the 5tuple from the packet >> sk = bpf_sk_lookup(ctx, &tuple, sizeof tuple, netns, 0); >> if (!sk) { >> // Couldn't find a socket listening for this traffic. Drop. >> return TC_ACT_SHOT; >> } >> bpf_sk_release(sk, 0); >> return TC_ACT_OK; >> >> Signed-off-by: Joe Stringer <j...@wand.net.nz> >> ---
... >> @@ -4032,6 +4036,96 @@ static const struct bpf_func_proto >> bpf_skb_get_xfrm_state_proto = { >> }; >> #endif >> >> +struct sock * >> +sk_lookup(struct net *net, struct bpf_sock_tuple *tuple) { > Would it be possible to have another version that > returns a sk without taking its refcnt? > It may have performance benefit. Not really. The sockets are not RCU-protected, and established sockets may be torn down without notice. If we don't take a reference, there's no guarantee that the socket will continue to exist for the duration of running the BPF program. >From what I follow, the comment below has a hidden implication which is that sockets without SOCK_RCU_FREE, eg established sockets, may be directly freed regardless of RCU. /* Sockets having SOCK_RCU_FREE will call this function after one RCU * grace period. This is the case for UDP sockets and TCP listeners. */ static void __sk_destruct(struct rcu_head *head) ... Therefore without the refcount, it won't be safe.