On Fri, Sep 9, 2016 at 1:41 PM, Thadeu Lima de Souza Cascardo <casca...@redhat.com> wrote: > Instead of using flow stats per NUMA node, use it per CPU. When using > megaflows, the stats lock can be a bottleneck in scalability. > > On a E5-2690 12-core system, usual throughput went from ~4Mpps to > ~15Mpps when forwarding between two 40GbE ports with a single flow > configured on the datapath. > > This has been tested on a system with possible CPUs 0-7,16-23. After > module removal, there were no corruption on the slab cache. > > Signed-off-by: Thadeu Lima de Souza Cascardo <casca...@redhat.com> > --- > net/openvswitch/flow.c | 43 +++++++++++++++++++++++-------------------- > net/openvswitch/flow.h | 4 ++-- > net/openvswitch/flow_table.c | 23 ++++++++++++----------- > 3 files changed, 37 insertions(+), 33 deletions(-) > > diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c > index 3609f37..2970a9f 100644 > --- a/net/openvswitch/flow.c > +++ b/net/openvswitch/flow.c > @@ -29,6 +29,7 @@ > #include <linux/module.h> > #include <linux/in.h> > #include <linux/rcupdate.h> > +#include <linux/cpumask.h> > #include <linux/if_arp.h> > #include <linux/ip.h> > #include <linux/ipv6.h> > @@ -72,32 +73,33 @@ void ovs_flow_stats_update(struct sw_flow *flow, __be16 > tcp_flags, > { > struct flow_stats *stats; > int node = numa_node_id(); > + int cpu = get_cpu(); > int len = skb->len + (skb_vlan_tag_present(skb) ? VLAN_HLEN : 0); > This function is always called from BH context. So calling smp_processor_id() for cpu id is fine. There is no need to handle pre-emption here.
> - stats = rcu_dereference(flow->stats[node]); > + stats = rcu_dereference(flow->stats[cpu]); > ... > diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c > index 957a3c3..60e5ae0 100644 > --- a/net/openvswitch/flow_table.c > +++ b/net/openvswitch/flow_table.c ... > @@ -102,9 +103,9 @@ struct sw_flow *ovs_flow_alloc(void) > > RCU_INIT_POINTER(flow->stats[0], stats); > > - for_each_node(node) > - if (node != 0) > - RCU_INIT_POINTER(flow->stats[node], NULL); > + for_each_possible_cpu(cpu) > + if (cpu != 0) > + RCU_INIT_POINTER(flow->stats[cpu], NULL); > I think at this point we should just use GFP_ZERO flag for allocating struct flow.