On Fri, Jun 26, 2020 at 3:46 AM Maxim Mikityanskiy <maxi...@mellanox.com> wrote: > > HTB doesn't scale well because of contention on a single lock, and it > also consumes CPU. Mellanox hardware supports hierarchical rate limiting > that can be leveraged by offloading the functionality of HTB.
True, essentially because it has to enforce a global rate limit with link sharing. There is a proposal of adding a new lockless shaping qdisc, which you can find in netdev list. > > Our solution addresses two problems of HTB: > > 1. Contention by flow classification. Currently the filters are attached > to the HTB instance as follows: > > # tc filter add dev eth0 parent 1:0 protocol ip flower dst_port 80 > classid 1:10 > > It's possible to move classification to clsact egress hook, which is > thread-safe and lock-free: > > # tc filter add dev eth0 egress protocol ip flower dst_port 80 > action skbedit priority 1:10 > > This way classification still happens in software, but the lock > contention is eliminated, and it happens before selecting the TX queue, > allowing the driver to translate the class to the corresponding hardware > queue. > > Note that this is already compatible with non-offloaded HTB and doesn't > require changes to the kernel nor iproute2. > > 2. Contention by handling packets. HTB is not multi-queue, it attaches > to a whole net device, and handling of all packets takes the same lock. > Our solution offloads the logic of HTB to the hardware and registers HTB > as a multi-queue qdisc, similarly to how mq qdisc does, i.e. HTB is > attached to the netdev, and each queue has its own qdisc. The control > flow is performed by HTB, it replicates the hierarchy of classes in > hardware by calling callbacks of the driver. Leaf classes are presented > by hardware queues. The data path works as follows: a packet is > classified by clsact, the driver selectes the hardware queue according > to its class, and the packet is enqueued into this queue's qdisc. Are you sure the HTB algorithm could still work even after you kinda make each HTB class separated? I think they must still share something when they borrow bandwidth from each other. This is why I doubt you can simply add a ->attach() without touching the core algorithm. Thanks.