On Thu, Feb 6, 2014 at 3:13 PM, Jarno Rajahalme <jrajaha...@nicira.com> wrote: > Keep kernel flow stats for each NUMA node rather than each (logical) > CPU. This avoids using the per-CPU allocator and removes most of the > kernel-side OVS locking overhead otherwise on the top of perf reports > and allows OVS to scale better with higher number of threads. > > With 9 handlers and 4 revalidators netperf TCP_CRR test flow setup > rate doubles on a server with two hyper-threaded physical CPUs (16 > logical cores each) compared to the current OVS master. Tested with > non-trivial flow table with a TCP port match rule forcing all new > connections with unique port numbers to OVS userspace. The IP > addresses are still wildcarded, so the kernel flows are not considered > as exact match 5-tuple flows. This type of flows can be expected to > appear in large numbers as the result of more effective wildcarding > made possible by improvements in OVS userspace flow classifier. > > Perf results for this test (master): > > Events: 305K cycles > + 8.43% ovs-vswitchd [kernel.kallsyms] [k] mutex_spin_on_owner > + 5.64% ovs-vswitchd [kernel.kallsyms] [k] __ticket_spin_lock > + 4.75% ovs-vswitchd ovs-vswitchd [.] find_match_wc > + 3.32% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_lock > + 2.61% ovs-vswitchd [kernel.kallsyms] [k] pcpu_alloc_area > + 2.19% ovs-vswitchd ovs-vswitchd [.] > flow_hash_in_minimask_range > + 2.03% swapper [kernel.kallsyms] [k] intel_idle > + 1.84% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_unlock > + 1.64% ovs-vswitchd ovs-vswitchd [.] classifier_lookup > + 1.58% ovs-vswitchd libc-2.15.so [.] 0x7f4e6 > + 1.07% ovs-vswitchd [kernel.kallsyms] [k] memset > + 1.03% netperf [kernel.kallsyms] [k] __ticket_spin_lock > + 0.92% swapper [kernel.kallsyms] [k] __ticket_spin_lock > ... > > And after this patch: > > Events: 356K cycles > + 6.85% ovs-vswitchd ovs-vswitchd [.] find_match_wc > + 4.63% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_lock > + 3.06% ovs-vswitchd [kernel.kallsyms] [k] __ticket_spin_lock > + 2.81% ovs-vswitchd ovs-vswitchd [.] > flow_hash_in_minimask_range > + 2.51% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_unlock > + 2.27% ovs-vswitchd ovs-vswitchd [.] classifier_lookup > + 1.84% ovs-vswitchd libc-2.15.so [.] 0x15d30f > + 1.74% ovs-vswitchd [kernel.kallsyms] [k] mutex_spin_on_owner > + 1.47% swapper [kernel.kallsyms] [k] intel_idle > + 1.34% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask > + 1.33% ovs-vswitchd ovs-vswitchd [.] rule_actions_unref > + 1.16% ovs-vswitchd ovs-vswitchd [.] hindex_node_with_hash > + 1.16% ovs-vswitchd ovs-vswitchd [.] do_xlate_actions > + 1.09% ovs-vswitchd ovs-vswitchd [.] ofproto_rule_ref > + 1.01% netperf [kernel.kallsyms] [k] __ticket_spin_lock > ... > > There is a small increase in kernel spinlock overhead due to the same > spinlock being shared between multiple cores of the same physical CPU, > but that is barely visible in the netperf TCP_CRR test performance > (maybe ~1% performance drop, hard to tell exactly due to variance in the > test > results), when testing for kernel module throughput (with no userspace > activity, handful of kernel flows). > > On flow setup, a single stats instance is allocated (for the NUMA node > 0). As CPUs from multiple NUMA nodes start updating stats, new > NUMA-node specific stats instances are allocated. This allocation on > the packet processing code path is made to never sleep or look for > emergency memory pools, minimizing the allocation latency. If the > allocation fails, the existing preallocated stats instance is used. > Also, if only CPUs from one NUMA-node are updating the preallocated > stats instance, no additional stats instances are allocated. This > eliminates the need to pre-allocate stats instances that will not be > used, also relieving the stats reader from the burden of reading stats > that are never used. Finally, this allocation strategy allows the > removal of the existing exact-5-tuple heuristics. > > Signed-off-by: Jarno Rajahalme <jrajaha...@nicira.com> Looks good.
Acked-by: Pravin B Shelar <pshe...@nicira.com> Thanks. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev