Similarly to what has been done earlier with other actions [1][2], this series tries to improve the performance of 'csum' tc action, removing a spinlock in the data plane. Patch 1 lets act_csum use per-CPU counters; patch 2 removes spin_{,un}lock_bh() calls from act() method.
test procedure: # ip link add name eth1 type dummy # ip link set dev eth1 up # tc qdisc add dev eth1 root handle 1: prio # tc filter add dev eth1 parent 1: matchall action csum udp # for n in 2 4 6 8 10 12 14 16; do > ./pktgen_bench_xmit_mode_queue_xmit.sh -v -s 64 -t $n -n 1000000 -i eth1 > done test results: $n | avg. pps/core | avg. pps/core | delta | (without patch) | (with patch) | (%) ---+-----------------+---------------+------ 2 | 484915 | 547716 | 13 4 | 209551 | 254439 | 21 6 | 143901 | 164695 | 14 8 | 112423 | 127821 | 14 10 | 91134 | 102950 | 13 12 | 75374 | 85499 | 13 14 | 64586 | 73426 | 14 16 | 56635 | 64111 | 13 references: [1] http://www.spinics.net/lists/netdev/msg334760.html [2] https://www.spinics.net/lists/netdev/msg465862.html Davide Caratti (2): net/sched: act_csum: use per-core statistics net/sched: act_csum: don't use spinlock in the fast path net/sched/act_csum.c | 17 ++++++----------- 1 file changed, 6 insertions(+), 11 deletions(-) -- 2.13.6