Hi Martin, On 02/01/2019 08:03 AM, Martin KaFai Lau wrote: > In kernel, it is common to check "!skb->sk && sk_fullsock(skb->sk)" > before accessing the fields in sock. For example, in __netdev_pick_tx: > > static u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb, > struct net_device *sb_dev) > { > /* ... */ > > struct sock *sk = skb->sk; > > if (queue_index != new_index && sk && > sk_fullsock(sk) && > rcu_access_pointer(sk->sk_dst_cache)) > sk_tx_queue_set(sk, new_index); > > /* ... */ > > return queue_index; > } > > This patch adds a "struct bpf_sock *sk" pointer to the "struct __sk_buff" > where a few of the convert_ctx_access() in filter.c has already been > accessing the skb->sk sock_common's fields, > e.g. sock_ops_convert_ctx_access(). > > "__sk_buff->sk" is a PTR_TO_SOCK_COMMON_OR_NULL in the verifier. > Some of the fileds in "bpf_sock" will not be directly > accessible through the "__sk_buff->sk" pointer. It is limited > by the new "bpf_sock_common_is_valid_access()". > e.g. The existing "type", "protocol", "mark" and "priority" in bpf_sock > are not allowed. > > The newly added "struct bpf_sock *bpf_sk_fullsock(struct bpf_sock *sk)" > can be used to get a sk with all accessible fields in "bpf_sock". > This helper is added to both cg_skb and sched_(cls|act). > > int cg_skb_foo(struct __sk_buff *skb) { > struct bpf_sock *sk; > __u32 family; > > sk = skb->sk; > if (!sk) > return 1; > > sk = bpf_sk_fullsock(sk); > if (!sk) > return 1; > > if (sk->family != AF_INET6 || sk->protocol != IPPROTO_TCP) > return 1; > > /* some_traffic_shaping(); */ > > return 1; > } > > (1) The sk is read only > > (2) There is no new "struct bpf_sock_common" introduced. > > (3) Future kernel sock's members could be added to bpf_sock only > instead of repeatedly adding at multiple places like currently > in bpf_sock_ops_md, bpf_sock_addr_md, sk_reuseport_md...etc. > > (4) After "sk = skb->sk", the reg holding sk is in type > PTR_TO_SOCK_COMMON_OR_NULL. > > (5) After bpf_sk_fullsock(), the return type will be in type > PTR_TO_SOCKET_OR_NULL which is the same as the return type of > bpf_sk_lookup_xxx(). > > However, bpf_sk_fullsock() does not take refcnt. The > acquire_reference_state() is only depending on the return type now. > To avoid it, a new is_acquire_function() is checked before calling > acquire_reference_state().
Bit unfortunate that a helper like bpf_sk_fullsock() would be needed, after all this is more of an implementation detail which we would expose here to the developer. Is there a specific reason why fetching skb->sk couldn't already be of the type PTR_TO_SOCKET_OR_NULL such that the bpf_sk_fullsock() step wouldn't be needed and most logic we have today could already be reused (modulo refcnt avoidance)? In particular, do you need the skb->sk without the full-sk part somewhere (e.g. in tw socks)? Why not doing something like sk_to_full_sk() inside the helper or even better as BPF ctx rewrite upon skb->sk to fetch the full sk parent where you could also access remaining bpf_sock fields? This could then also be plugged into bpf_tcp_sock() given this needs to be full sk anyway. Thanks, Daniel