'perf' report on current master shows that kernel-side locking is
taking more than 20% of the overall OVS execution time under TCP_CRR
tests with flow tables that make all new connections go to userspace
and that create huge amount of kernel flows (~60000) on a 32-CPU
server.  It turns out that disabling per-CPU flow stats largely
removes this overhead and significantly improves performance on *this
kind of a test*.

To address this problem, this series:
- reduces overhead by not locking all-zero stats
- Keeps flow stats on NUMA nodes, essentially avoiding locking between
  physical CPUs.  Stats readers still need to read/lock across the CPUs,
  but now once instead of 16 times in the test case described above.

In order to avoid performance regressions elsewhere, this series also
introduces prefetching to avoid stalling due to stats being out
of L1 cache for both readers and writers.

With all of these applied, the OVS kernel-side locking overhead is not
among the top of 'perf' reports any more.

The effectiveness of these strategies under different load scenarios
requires more testing.

Jarno Rajahalme (4):
  datapath/flow.c: Only read stats if non-zero.
  datapath: NUMA node flow stats.
  datapath: Prefetch flow stats.
  datapath: Update flow stats after execute_actions.

 datapath/datapath.c   |   47 ++++++++++++++++++++++-
 datapath/flow.c       |  101 ++++++++++++++++---------------------------------
 datapath/flow.h       |    8 ++--
 datapath/flow_table.c |   44 +++++++++++++--------
 datapath/flow_table.h |    2 +
 5 files changed, 114 insertions(+), 88 deletions(-)

-- 
1.7.10.4

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to