On Tue, Apr 18, 2017 at 10:01 AM, Daniel Borkmann <dan...@iogearbox.net> wrote: > Hi Cong, > > sorry for the late reply. Generally the patch looks good to me, just > a few comments inline: > > On 04/17/2017 08:30 PM, Cong Wang wrote: >> >> Roi reported we could have a race condition where in ->classify() path >> we dereference tp->root and meanwhile a parallel ->destroy() makes it >> a NULL. > > > Correct. > >> This is possible because ->destroy() could be called when deleting >> a filter to check if we are the last one in tp, this tp is still >> linked and visible at that time. >> >> Daniel fixed this in commit d936377414fa >> ("net, sched: respect rcu grace period on cls destruction"), but >> the root cause of this problem is the semantic of ->destroy(), it >> does two things (for non-force case): >> >> 1) check if tp is empty >> 2) if tp is empty we could really destroy it >> >> and its caller, if cares, needs to check its return value to see if >> it is really destroyed. Therefore we can't unlink tp unless we know >> it is empty. >> >> As suggested by Daniel, we could actually move the test logic to >> ->delete() >> so that we can safely unlink tp after ->delete() tells us the last one is >> just deleted and before ->destroy(). >> >> What's more, even we unlink it before ->destroy(), it could still have >> readers since we don't wait for a grace period here, we should not modify >> tp->root in ->destroy() either. > > > Here seems to be a bit of a mixup in this analysis, imo. The issue > that Roi reported back then was exactly the one that d936377414fa ("net, > sched: respect rcu grace period on cls destruction") fixed, which > affected flower and other classifiers: > > Roi reported a crash in flower where tp->root was NULL in ->classify() > callbacks. Reason is that in ->destroy() tp->root is set to NULL via > RCU_INIT_POINTER(). It's problematic for some of the classifiers, > because > this doesn't respect RCU grace period for them, and as a result, still > outstanding readers from tc_classify() will try to blindly dereference > a NULL tp->root. > > The ->delete() callback was never used by Roi back then, he said that > he just removed the ingress qdisc in his test, which implicitly purges > all cls attached to it via tcf_destroy_chain(). So the above description > with regards to the "root cause" of Roi's reported issue is not correct.
Hmm, thanks for clarifying this, I will remove this part, together with the Reported-by of Roi. > > The issue that this patch fixes is an _independent_ race that we found > while auditing the code when looking into Roi's report back then. It > fixes commit 1e052be69d04 ("net_sched: destroy proto tp when all filters > are gone"), which added the RCU_INIT_POINTER() after the tcf_destroy() in > RTM_DELTFILTER case. That part of the description looks good, where you > describe that "[...] we need to move the test logic to ->delete(), so > that we can safely unlink tp after ->delete() tells us the last one is > just deleted and before ->destroy()." OK. > > Please also add Fixes tag, so it can be better tracked for backports. > > Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are > gone") Actually I intentionally remove the Fixes tag because I don't think we need to backport it to stable as no one reports a _real_ crash so far, right? Or you saw a real one? (Not to mention its size does not fit for -stable either.) > The above three RCU_INIT_POINTER(tp->root, NULL) are independent > of the fix and actually do no harm right now. I described that in > d936377414fa ("net, sched: respect rcu grace period on cls destruction") > as well, meaning that they each handle tp->root being NULL in ->classify() > path (for historic reasons), so this is handled gracefully, readers use > rcu_dereference_bh(tp->root) and test for this being NULL. > > But I agree that this could be cleaned up along with the check in the > ->classify() callbacks for these three (not sure if really worth it, > though). However, such cleanup should be a separate patch and not > included in this fix. Agreed. I will make it a separated patch and send them together. Thanks.