'perf' report on current master shows that kernel-side locking is taking more than 20% of the overall OVS execution time under TCP_CRR tests with flow tables that make all new connections go to userspace and that create huge amount of kernel flows (~60000) on a 32-CPU server. It turns out that disabling per-CPU flow stats largely removes this overhead and significantly improves performance on *this kind of a test*.
To address this problem, this series: - reduces overhead by not locking all-zero stats - Keeps flow stats on NUMA nodes, essentially avoiding locking between physical CPUs. Stats readers still need to read/lock across the CPUs, but now once instead of 16 times in the test case described above. In order to avoid performance regressions elsewhere, this series also introduces prefetching to avoid stalling due to stats being out of L1 cache for both readers and writers. With all of these applied, the OVS kernel-side locking overhead is not among the top of 'perf' reports any more. The effectiveness of these strategies under different load scenarios requires more testing. Jarno Rajahalme (4): datapath/flow.c: Only read stats if non-zero. datapath: NUMA node flow stats. datapath: Prefetch flow stats. datapath: Update flow stats after execute_actions. datapath/datapath.c | 47 ++++++++++++++++++++++- datapath/flow.c | 101 ++++++++++++++++--------------------------------- datapath/flow.h | 8 ++-- datapath/flow_table.c | 44 +++++++++++++-------- datapath/flow_table.h | 2 + 5 files changed, 114 insertions(+), 88 deletions(-) -- 1.7.10.4 _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev