IP defrag processing is one of the remaining problematic layer in linux. It uses static hash tables of 1024 buckets, and up to 128 items per bucket.
A work queue is supposed to garbage collect items when host is under memory pressure, and doing a hash rebuild, changing seed used in hash computations. This work queue blocks softirqs for up to 25 ms when doing a hash rebuild, occurring every 5 seconds if host is under fire. Then there is the problem of sharing this hash table for all netns. It is time to switch to rhashtables, and allocate one of them per netns to speedup netns dismantle, since this is a critical metric these days. Lookup is now using RCU, and 64bit hosts can now provision whatever amount of memory needed to handle the expected workloads. v2: Addressed Herbert and Kirill feedbacks (Use rhashtable_free_and_destroy(), and split the big patch into small units) v3: Removed the extra add_frag_mem_limit(...) from inet_frag_create() Removed the refcount_inc_not_zero() call from inet_frags_free_cb(), as we can exploit del_timer() return value. v4: kbuild robot feedback about one missing static (squashed) Additional patches : inet: frags: do not clone skb in ip_expire() ipv6: frags: rewrite ip6_expire_frag_queue() rhashtable: reorganize struct rhashtable layout inet: frags: reorganize struct netns_frags inet: frags: get rid of ipfrag_skb_cb/FRAG_CB ipv6: frags: get rid of ip6frag_skb_cb/FRAG6_CB inet: frags: get rid of nf_ct_frag6_skb_cb/NFCT_FRAG6_CB Eric Dumazet (19): ipv6: frag: remove unused field inet: frags: change inet_frags_init_net() return value inet: frags: add a pointer to struct netns_frags inet: frags: refactor ipv6_frag_init() inet: frags: refactor lowpan_net_frag_init() inet: frags: refactor ipfrag_init() rhashtable: add schedule points inet: frags: use rhashtables for reassembly units inet: frags: remove some helpers inet: frags: get rif of inet_frag_evicting() inet: frags: remove inet_frag_maybe_warn_overflow() inet: frags: break the 2GB limit for frags storage inet: frags: do not clone skb in ip_expire() ipv6: frags: rewrite ip6_expire_frag_queue() rhashtable: reorganize struct rhashtable layout inet: frags: reorganize struct netns_frags inet: frags: get rid of ipfrag_skb_cb/FRAG_CB ipv6: frags: get rid of ip6frag_skb_cb/FRAG6_CB inet: frags: get rid of nf_ct_frag6_skb_cb/NFCT_FRAG6_CB Documentation/networking/ip-sysctl.txt | 11 +- include/linux/rhashtable.h | 8 +- include/linux/skbuff.h | 1 + include/net/inet_frag.h | 126 ++++----- include/net/ip.h | 1 - include/net/ipv6.h | 27 +- lib/rhashtable.c | 2 + net/ieee802154/6lowpan/6lowpan_i.h | 26 +- net/ieee802154/6lowpan/reassembly.c | 150 +++++----- net/ipv4/inet_fragment.c | 362 +++++------------------- net/ipv4/ip_fragment.c | 247 ++++++++-------- net/ipv4/proc.c | 6 +- net/ipv6/netfilter/nf_conntrack_reasm.c | 119 +++----- net/ipv6/proc.c | 5 +- net/ipv6/reassembly.c | 235 ++++++++------- 15 files changed, 499 insertions(+), 827 deletions(-) -- 2.17.0.rc1.321.gba9d0f2565-goog