While BQL bulk dequeue works well for TSO packets, it is not very efficient as soon as GSO is involved.
On a GSO only workload (UDP or TCP), this patch series can save about 8 % of cpu cycles on a 40Gbit mlx4 NIC, by keeping optimal batching, and avoiding expensive qdisc requeues and reschedules. This patch series : - Add netdev_tx_sent_queue_more() so that drivers can implement efficient BQL and xmit_more support. - Implement a work around in dev_hard_start_xmit() for drivers not using netdev_tx_sent_queue_more() - changes mlx4 to use netdev_tx_sent_queue_more() Eric Dumazet (3): net: bql: add netdev_tx_sent_queue_more() helper net: do not abort bulk send on BQL status net/mlx4_en: use netdev_tx_sent_queue_more() drivers/net/ethernet/mellanox/mlx4/en_tx.c | 10 ++++++++-- include/linux/netdevice.h | 12 ++++++++++++ net/core/dev.c | 2 +- 3 files changed, 21 insertions(+), 3 deletions(-) -- 2.19.1.568.g152ad8e336-goog