On Mon, Dec 14, 2020 at 7:13 AM Maxim Mikityanskiy <maxi...@nvidia.com> wrote:
>
> On 2020-12-11 21:16, Cong Wang wrote:
> > On Fri, Dec 11, 2020 at 7:26 AM Maxim Mikityanskiy <maxi...@mellanox.com> 
> > wrote:
> >>
> >> HTB doesn't scale well because of contention on a single lock, and it
> >> also consumes CPU. This patch adds support for offloading HTB to
> >> hardware that supports hierarchical rate limiting.
> >>
> >> This solution addresses two main problems of scaling HTB:
> >>
> >> 1. Contention by flow classification. Currently the filters are attached
> >> to the HTB instance as follows:
> >
> > I do not think this is the reason, tcf_classify() has been called with RCU
> > only on the ingress side for a rather long time. What contentions are you
> > talking about here?
>
> When one attaches filters to HTB, tcf_classify is called from
> htb_classify, which is called from htb_enqueue, which is called with the
> root spinlock of the qdisc taken.

So it has nothing to do with tcf_classify() itself... :-/

[...]

> > And doesn't TBF already work with mq? I mean you can attach it as
> > a leaf to each mq so that the tree lock will not be shared either, but you'd
> > lose the benefits of a global rate limit too.
>
> Yes, I'd lose not only the global rate limit, but also multi-level
> hierarchical limits, which are all provided by this HTB offload - that's
> why TBF is not really a replacement for this feature.

Interesting, please explain how your HTB offload still has a global rate
limit and borrowing across queues? I simply can't see it, all I can see
is you offload HTB into each queue in ->attach(), where I assume the
hardware will do rate limit on each queue, if the hardware also has a
global control, why it is not reflected on the root qdisc?

Thanks!

Reply via email to