Under very high TX stress, CPU handling NIC TX completions can spend considerable amount of cycles handling TSQ (TCP Small Queues) logic.
This patch series avoids some atomic operations, but more important patch is the 3rd one, allowing other cpus processing ACK packets and calling tcp_write_xmit() to grab TCP_TSQ_DEFERRED so that tcp_tasklet_func() can skip already processed sockets. This avoid lots of lock acquisitions and cache lines accesses, particularly under load. Eric Dumazet (4): tcp: tsq: add tsq_flags / tsq_enum tcp: tsq: remove one locked operation in tcp_wfree() tcp: tsq: add shortcut in tcp_tasklet_func() tcp: tsq: avoid one atomic in tcp_wfree() include/linux/tcp.h | 11 ++++++++++- net/ipv4/tcp_output.c | 54 +++++++++++++++++++++++++++++++-------------------- 2 files changed, 43 insertions(+), 22 deletions(-) -- 2.8.0.rc3.226.g39d4020