Currently, all netlink protocol handlers for updating rules, actions and qdiscs are protected with single global rtnl lock which removes any possibility for parallelism. This patch set is a third step to remove rtnl lock dependency from TC rules update path.
Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added. Handlers registered with this flag are called without RTNL taken. End goal is to have rule update handlers(RTM_NEWTFILTER, RTM_DELTFILTER, etc.) to be registered with UNLOCKED flag to allow parallel execution. However, there is no intention to completely remove or split rtnl lock itself. This patch set addresses specific problems in implementation of classifiers API that prevent its control path from being executed concurrently, and completes refactoring of cls API rules update handlers by removing rtnl lock dependency from code that handles chains and classifiers. Rules update handlers are registered with RTNL_FLAG_DOIT_UNLOCKED flag. This patch set substitutes global rtnl lock dependency on rules update path in cls API by extending its data structures with following locks: - tcf_block with 'lock' spinlock. It is used to protect block state, and life-time management fields of chains on the block (chain->refcnt, chain->action_refcnt, chain->explicitly crated, etc.). - tcf_chain with 'filter_chain_lock' spinlock, that is used to protect list of classifier instances attached to chain. - tcf_proto with 'lock' spinlock that is intended to be used to synchronize access to classifiers that support unlocked execution. Chain0 head change callbacks can be sleeping and cannot be protected by block spinlock. To solve this issue, sleeping miniqp swap function (used as head change callback by ingress and clsact Qdiscs) is refactored to offload sleeping operations to workqueue. New ordered workqueue 'tc_proto_workqueue' is created in cls_api to be used by miniqp and for tcf_proto deallocation, which is also moved to workqueue to prevent deallocation of tp's that are still in use by block. Performing both miniqp swap and tp deallocation on same ordered workqueue ensures that any pending head change requests involving tp are completed before the tp is deallocated. Classifiers are extended with reference counting to accommodate parallel access by unlocked cls API. Classifier ops structure is extended with additional 'put' function to allow reference counting of filters and intended to be used by classifiers that implement rtnl-unlocked API. Users of classifiers and individual filter instances are modified to always hold reference while working with them. Classifiers that support unlocked execution still need to know the status of rtnl lock, so their API is extended with additional 'rtnl_held' argument that is used to indicate that caller holds rtnl lock. Changes from V1 to V2: - Patch 1: - Use smp_store_release() instead of xchg() for setting miniqp->tp_head. - Move Qdisc deallocation to tc_proto_wq ordered workqueue that is used to destroy tcf proto instances. This is necessary to ensure that Qdisc is destroyed after all instances of chain/proto that it contains in order to prevent use-after-free error in tc_chain_notify_delete(). - Cache parent net device ifindex in block to prevent use-after-free of dev queue in tc_chain_notify_delete(). - Patch 2: - Use lockdep_assert_held() instead of spin_is_locked() for assertion. - Use 'free_block' argument in tcf_chain_destroy() instead of checking block's reference count and chain_list for second time. - Patch 7: - Refactor tcf_chain0_head_change_cb_add() to not take block->lock and chain0->filter_chain_lock in correct order. - Patch 10: - Always set 'tp_created' flag when creating tp to prevent releasing the chain twice when tp with same priority was inserted concurrently. - Patch 11: - Add additional check to prevent creation of new proto instance when parent chain is being flushed to reduce CPU usage. - Don't call tcf_chain_delete_empty() if tp insertion failed. - Patch 16 - new. - Patch 17: - Refactor to only lock take rtnl lock once (at the beginning of rule update handlers). - Always release rtnl mutex in the same function that locked it. Remove unlock code from tcf_block_release(). Vlad Buslov (17): net: sched: refactor mini_qdisc_pair_swap() to use workqueue net: sched: protect block state with spinlock net: sched: refactor tc_ctl_chain() to use block->lock net: sched: protect block->chain0 with block->lock net: sched: traverse chains in block with tcf_get_next_chain() net: sched: protect chain template accesses with block lock net: sched: lock the chain when accessing filter_chain list net: sched: introduce reference counting for tcf proto net: sched: traverse classifiers in chain with tcf_get_next_proto() net: sched: refactor tp insert/delete for concurrent execution net: sched: prevent insertion of new classifiers during chain flush net: sched: track rtnl lock status when validating extensions net: sched: extend proto ops with 'put' callback net: sched: extend proto ops to support unlocked classifiers net: sched: add flags to Qdisc class ops struct net: sched: refactor tcf_block_find() into standalone functions net: sched: unlock rules update API include/net/pkt_cls.h | 6 +- include/net/sch_generic.h | 81 ++- net/sched/cls_api.c | 1213 ++++++++++++++++++++++++++++++++++----------- net/sched/cls_basic.c | 14 +- net/sched/cls_bpf.c | 15 +- net/sched/cls_cgroup.c | 13 +- net/sched/cls_flow.c | 15 +- net/sched/cls_flower.c | 16 +- net/sched/cls_fw.c | 15 +- net/sched/cls_matchall.c | 16 +- net/sched/cls_route.c | 14 +- net/sched/cls_rsvp.h | 16 +- net/sched/cls_tcindex.c | 17 +- net/sched/cls_u32.c | 14 +- net/sched/sch_api.c | 10 +- net/sched/sch_generic.c | 56 ++- net/sched/sch_ingress.c | 4 + 17 files changed, 1175 insertions(+), 360 deletions(-) -- 2.7.5