On Tue 06 Apr 2021 at 01:56, Cong Wang <xiyou.wangc...@gmail.com> wrote: > On Sat, Apr 3, 2021 at 3:01 AM Vlad Buslov <vla...@nvidia.com> wrote: >> So, the following happens in reproduction provided in commit message >> when executing "tc actions add action simple sdata \"1\" index 1 >> action simple sdata \"2\" index 2" command: >> >> 1. tcf_action_init() is called with batch of two actions of same type, >> no module references are held, 'actions' array is empty: >> >> act_simple refcnt balance = 0 >> actions[] = {} >> >> 2. tc_action_load_ops() is called for first action: >> >> act_simple refcnt balance = +1 >> actions[] = {} >> >> 3. tc_action_load_ops() is called for second action: >> >> act_simple refcnt balance = +2 >> actions[] = {} >> >> 4. tcf_action_init_1() called for first action, succeeds, action >> instance is assigned to 'actions' array: >> >> act_simple refcnt balance = +2 >> actions[] = { [0]=act1 } >> >> 5. tcf_action_init_1() fails for second action, 'actions' array not >> changed, goto err: >> >> act_simple refcnt balance = +2 >> actions[] = { [0]=act1 } >> >> 6. tcf_action_destroy() is called for 'actions' array, last reference to >> first action is released, tcf_action_destroy_1() calls module_put() for >> actions module: >> >> act_simple refcnt balance = +1 >> actions[] = {} >> >> 7. err_mod loop starts iterating over ops array, executes module_put() >> for first actions ops: >> >> act_simple refcnt balance = 0 >> actions[] = {} >> >> 7. err_mod loop executes module_put() for second actions ops: >> >> act_simple refcnt balance = -1 >> actions[] = {} >> >> >> The goal of my fix is to not unconditionally release the module >> reference for successfully initialized actions because this is already >> handled by action destroy code. Hope this explanation clarifies things. > > Great explanation! It seems harder and harder to understand the > module refcnt here. How about we just take the refcnt when we > successfully create an action? Something like this: > > diff --git a/net/sched/act_api.c b/net/sched/act_api.c > index b919826939e0..075cc80480bf 100644 > --- a/net/sched/act_api.c > +++ b/net/sched/act_api.c > @@ -493,6 +493,7 @@ int tcf_idr_create(struct tc_action_net *tn, u32 > index, struct nlattr *est, > } > > p->idrinfo = idrinfo; > + __module_get(ops->owner); > p->ops = ops; > *a = p; > return 0; > @@ -1035,13 +1036,6 @@ struct tc_action *tcf_action_init_1(struct net > *net, struct tcf_proto *tp, > if (!name) > a->hw_stats = hw_stats; > > - /* module count goes up only when brand new policy is created > - * if it exists and is only bound to in a_o->init() then > - * ACT_P_CREATED is not returned (a zero is). > - */ > - if (err != ACT_P_CREATED) > - module_put(a_o->owner); > - > return a; > > err_out: > @@ -1100,7 +1094,8 @@ int tcf_action_init(struct net *net, struct > tcf_proto *tp, struct nlattr *nla, > tcf_idr_insert_many(actions); > > *attr_size = tcf_action_full_attrs_size(sz); > - return i - 1; > + err = i - 1; > + goto err_mod: > > err: > tcf_action_destroy(actions, bind); > > The idea is on the higher level we hold refcnt when loading module and > put it back _unconditionally_ when returning, and hold a refcnt only when > we create an action and conditionally put it back when an error happens. > With pseudo code, it is something like this: > > load_ops() // module refcnt +1 > init_actions(); // module refcnt +1 only when create a new one > if (err) > // module refcnt -1 when we delete one > module_put(); > module_put(); // module refcnt -1 > > This looks much easier to track. What do you think? > > Thanks!
Indeed, your suggestion looks more straightforward. The only thing we need to mind is that action->init() callbacks assume that caller releases the module even after calling tcf_idr_create(), so we also need to modify tcf_idr_release() (used by error handlers in action->init() implementations) to release the module. I'll run some tests tomorrow to verify that I'm not missing anything else. Regards, Vlad