On Wed, Feb 27, 2019 at 6:28 AM Vlad Buslov <vla...@mellanox.com> wrote: > > > On Tue 26 Feb 2019 at 22:38, Cong Wang <xiyou.wangc...@gmail.com> wrote: > > On Tue, Feb 26, 2019 at 7:08 AM Vlad Buslov <vla...@mellanox.com> wrote: > >> > >> > >> On Mon 25 Feb 2019 at 22:52, Cong Wang <xiyou.wangc...@gmail.com> wrote: > >> > On Mon, Feb 25, 2019 at 7:38 AM Vlad Buslov <vla...@mellanox.com> wrote: > >> >> > >> >> Using tcf_walker->stop flag to determine when tcf_walker->fn() was > >> >> called > >> >> at least once is unreliable. Some classifiers set 'stop' flag on error > >> >> before calling walker callback, other classifiers used to call it with > >> >> NULL > >> >> filter pointer when empty. In order to prevent further regressions, > >> >> extend > >> >> tcf_walker structure with dedicated 'nonempty' flag. Set this flag in > >> >> tcf_walker->fn() implementation that is used to check if classifier has > >> >> filters configured. > >> > > >> > > >> > So, after this patch commits like 31a998487641 ("net: sched: fw: don't > >> > set arg->stop in fw_walk() when empty") can be reverted?? > >> > >> Yes, it is safe now to revert following commits: > >> > >> 3027ff41f67c ("net: sched: route: don't set arg->stop in route4_walk() > >> when empty") > >> 31a998487641 ("net: sched: fw: don't set arg->stop in fw_walk() when > >> empty") > > > > Yeah, and probably commit d66022cd1623 > > ("net: sched: matchall: verify that filter is not NULL in mall_walk()"). > > > > Please send a patch to revert them all. > > > > Thanks. > > I think commit d66022cd1623 ("net: sched: matchall: verify that filter > is not NULL in mall_walk()") and commit 8b58d12f4ae1 ("net: sched: > cgroup: verify that filter is not NULL during walk") shouldn't be > reverted. They are still necessary to prevent tcf_chain_dump() from > dumping NULL filter pointer. It can happen when dump is initiated in > parallel with inserting first filter to unlocked classifier. > tcf_fill_node() verifies that filter pointer is not NULL, so it will not > crash, but will output tcf_proto info for second time. This might > "confuse" user-space.
I don't get this. First of all, what's confused here? Secondly, if there is something confusing, isn't it all because of your parallel algorithm? That is, the retry logic. I don't see how commit d66022cd1623 could be useful in this context, it helps to prevent a NULL crash which isn't a concern as long as it is checked in tcf_fill_node() as you described. Thanks.