On Mon, Nov 13, 2017 at 10:17 AM, Michael Ma <make0...@gmail.com> wrote: > 2017-11-12 16:14 GMT-08:00 Stephen Hemminger <step...@networkplumber.org>: >> On Sun, 12 Nov 2017 13:43:13 -0800 >> Michael Ma <make0...@gmail.com> wrote: >> >>> Any comments? We plan to implement this as a qdisc and appreciate any early >>> feedback. >>> >>> Thanks, >>> Michael >>> >>> > On Nov 9, 2017, at 5:20 PM, Michael Ma <make0...@gmail.com> wrote: >>> > >>> > Currently txq/qdisc selection is based on flow hash so packets from >>> > the same flow will follow the order when they enter qdisc/txq, which >>> > avoids out-of-order problem. >>> > >>> > To improve the concurrency of QoS algorithm we plan to have multiple >>> > per-cpu queues for a single TC class and do busy polling from a >>> > per-class thread to drain these queues. If we can do this frequently >>> > enough the out-of-order situation in this polling thread should not be >>> > that bad. >>> > >>> > To give more details - in the send path we introduce per-cpu per-class >>> > queues so that packets from the same class and same core will be >>> > enqueued to the same place. Then a per-class thread poll the queues >>> > belonging to its class from all the cpus and aggregate them into >>> > another per-class queue. This can effectively reduce contention but >>> > inevitably introduces potential out-of-order issue. >>> > >>> > Any concern/suggestion for working towards this direction? >> >> In general, there is no meta design discussions in Linux development >> Several developers have tried to do lockless >> qdisc and similar things in the past. >> >> The devil is in the details, show us the code. > > Thanks for the response, Stephen. The code is fairly straightforward, > we have a per-cpu per-class queue defined as this: > > struct bandwidth_group > { > struct skb_list queues[MAX_CPU_COUNT]; > struct skb_list drain; > } > > "drain" queue is used to aggregate per-cpu queues belonging to the > same class. In the enqueue function, we determine the cpu where the > packet is processed and enqueue it to the corresponding per-cpu queue: > > int cpu; > struct bandwidth_group *bwg = &bw_rx_groups[bwgid]; > > cpu = get_cpu(); > skb_list_append(&bwg->queues[cpu], skb); > > Here we don't check the flow of the packet so if there is task > migration or multiple threads sending packets through the same flow we > theoretically can have packets enqueued to different queues and > aggregated to the "drain" queue out of order. > > Also AFAIK there is no lockless htb-like qdisc implementation > currently, however if there is already similar effort ongoing please > let me know.
The question I would have is how would this differ from using XPS w/ mqprio? Would this be a classful qdisc like HTB or a classless one like mqprio? >From what I can tell XPS would be able to get you your per-cpu functionality, the benefit of it though would be that it would avoid out-of-order issues for sockets originating on the local system. The only thing I see as an issue right now is that the rate limiting with mqprio is assumed to be handled via hardware due to mechanisms such as DCB. - Alex