On Mon, Dec 14, 2020 at 7:13 AM Maxim Mikityanskiy <maxi...@nvidia.com> wrote: > > On 2020-12-11 21:16, Cong Wang wrote: > > On Fri, Dec 11, 2020 at 7:26 AM Maxim Mikityanskiy <maxi...@mellanox.com> > > wrote: > >> > >> HTB doesn't scale well because of contention on a single lock, and it > >> also consumes CPU. This patch adds support for offloading HTB to > >> hardware that supports hierarchical rate limiting. > >> > >> This solution addresses two main problems of scaling HTB: > >> > >> 1. Contention by flow classification. Currently the filters are attached > >> to the HTB instance as follows: > > > > I do not think this is the reason, tcf_classify() has been called with RCU > > only on the ingress side for a rather long time. What contentions are you > > talking about here? > > When one attaches filters to HTB, tcf_classify is called from > htb_classify, which is called from htb_enqueue, which is called with the > root spinlock of the qdisc taken.
So it has nothing to do with tcf_classify() itself... :-/ [...] > > And doesn't TBF already work with mq? I mean you can attach it as > > a leaf to each mq so that the tree lock will not be shared either, but you'd > > lose the benefits of a global rate limit too. > > Yes, I'd lose not only the global rate limit, but also multi-level > hierarchical limits, which are all provided by this HTB offload - that's > why TBF is not really a replacement for this feature. Interesting, please explain how your HTB offload still has a global rate limit and borrowing across queues? I simply can't see it, all I can see is you offload HTB into each queue in ->attach(), where I assume the hardware will do rate limit on each queue, if the hardware also has a global control, why it is not reflected on the root qdisc? Thanks!