On Mon, Dec 10, 2018 at 11:21 AM Florian Westphal <f...@strlen.de> wrote: > > The (out-of-tree) Multipath-TCP implementation needs a significant amount > of extra space in the skb control buffer. > > Increasing skb->cb[] size in mainline is a non-starter for memory and > and performance reasons (f.e. increase in cb size also moves several > frequently-accessed fields to other cache lines). > > One approach that might work for MPTCP is to extend skb_shared_info instead > of sk_buff. However, this comes with other drawbacks, e.g. it either > needs special skb allocation to make sure there is enough space for such > 'extended shinfo' at the end of data buffer (which would make this only > useable for the MPTCP tx path) or such a change would increase size of > skb_shared_info. > > This work adds an extension infrastructure for sk_buff: > 1. extension memory is released when the sk_buff is free'd. > 2. data is shared after cloning an skb. > > This is also how xfrm and bridge netfilter skb-associated data > (skb->sp and skb->nf_bridge) is handled. > > This series removes the nf_bridge pointer and makes bridge netfilter > use the extension infrastructure instead. > > Two new members are added to sk_buff: > 1. 'active_extensions' byte (filling a hole), telling which extensions > have been allocated for the skb. > 2. extension pointer, located at the end of the sk_buff. > If active_extensions is 0, its content is undefined. > > The 'nf_bridge' pointer is removed, i.e. sk_buff size remains the same. > > This adds extra code to skb clone and free paths (to deal with > refcount/free of extension area) but replaces the existing code that > deals with skb->nf_bridge. > > This patch only adds the basic infrastructure, the nf_bridge conversion > is added in the next patch. > > Conversion of skb->sp (ipsec/xfrm secpath) to an skb extension is planned > as a followup. > > Signed-off-by: Florian Westphal <f...@strlen.de> > --- > include/linux/skbuff.h | 119 +++++++++++++++++++++++++++++++++++- > net/Kconfig | 3 + > net/core/skbuff.c | 133 +++++++++++++++++++++++++++++++++++++++++ > net/ipv4/ip_output.c | 1 + > net/ipv6/ip6_output.c | 1 + > 5 files changed, 256 insertions(+), 1 deletion(-) > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index b1831a5ca173..d715736eb734 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -245,6 +245,7 @@ struct iov_iter; > struct napi_struct; > struct bpf_prog; > union bpf_attr; > +struct skb_ext; > > #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE) > struct nf_conntrack { > @@ -636,6 +637,7 @@ typedef unsigned char *sk_buff_data_t; > * @queue_mapping: Queue mapping for multiqueue devices > * @xmit_more: More SKBs are pending for this queue > * @pfmemalloc: skbuff was allocated from PFMEMALLOC reserves > + * @active_extensions: active extensions (skb_ext_id types) > * @ndisc_nodetype: router type (from link layer) > * @ooo_okay: allow the mapping of a socket to a queue to be changed > * @l4_hash: indicate hash is a canonical 4-tuple hash over transport > @@ -665,6 +667,7 @@ typedef unsigned char *sk_buff_data_t; > * @data: Data head pointer > * @truesize: Buffer size > * @users: User count - see {datagram,tcp}.c > + * @extensions: allocated extensions, valid if active_extensions is > nonzero > */ > > struct sk_buff { > @@ -747,7 +750,9 @@ struct sk_buff { > head_frag:1, > xmit_more:1, > pfmemalloc:1; > - > +#ifdef CONFIG_SKB_EXTENSIONS > + __u8 active_extensions; > +#endif
This byte could be saved by moving the bits to the first byte of the new extension. The likely cold cacheline now needs to be checked, but only if the pointer is non-NULL. If exclusively using accessor functions to access the struct, even this can be avoided if encoding the first 3 or 7 active extensions in the pointer itself. > /* fields enclosed in headers_start/headers_end are copied > * using a single memcpy() in __copy_skb_header() > */ > @@ -869,6 +874,11 @@ struct sk_buff { > *data; > unsigned int truesize; > refcount_t users; > + > +#ifdef CONFIG_SKB_EXTENSIONS > + /* only useable after checking ->active_extensions != 0 */ > + struct skb_ext *extensions; > +#endif > };