On Wed, 2016-11-30 at 18:50 -0800, Eric Dumazet wrote: > On Wed, 2016-11-30 at 18:32 -0800, Eric Dumazet wrote: > > > I simply suggest we try to queue the qdisc for further servicing as we > > do today, from net_tx_action(), but we might use a different bit, so > > that we leave the opportunity for another cpu to get __QDISC_STATE_SCHED > > before we grab it from net_tx_action(), maybe 100 usec later (time to > > flush all skbs queued in napi_consume_skb() and maybe RX processing, > > since most NIC handle TX completion before doing RX processing from thei > > napi poll handler. > > > > Should be doable with few changes in __netif_schedule() and > > net_tx_action(), plus some control paths that will need to take care of > > the new bit at dismantle time, right ? > > Hmm... this is silly. Code already implements a different bit. > > qdisc_run() seems to run more often from net_tx_action(), I have to find > out why.
After more analysis I believe TSQ was one of the bottlenecks. I prepared a patch series that helped my use cases.