From: Alexei Starovoitov <a...@fb.com> Date: Mon, 1 Feb 2016 22:39:52 -0800
> We've started to use bpf to trace every packet and atomic add > instruction (event JITed) started to show up in perf profile. > The solution is to do per-cpu counters. > For PERCPU_(HASH|ARRAY) map the existing bpf_map_lookup() helper > returns per-cpu area which bpf programs can use to store and > increment the counters. The BPF_MAP_LOOKUP_ELEM syscall command > returns areas from all cpus and user process aggregates the counters. > The usage example is in patch 6. The api turned out to be very > easy to use from bpf program and from user space. > Long term we were discussing to add 'bounded loop' instruction, > so bpf programs can do aggregation within the program which may > help some use cases. Right now user space aggregation of > per-cpu counters fits the best. > > This patch set is new approach for per-cpu hash and array maps. > I've reused the map tests written by Martin and Ming, but > implementation and api is new. Old discussion here: > http://thread.gmane.org/gmane.linux.kernel/2123800/focus=2126435 Series applied, thanks Alexei.