On 07/22/2015 10:09 AM, Kaixu Xia wrote:
Previous patch v1 url:
https://lkml.org/lkml/2015/7/17/287

[ Sorry to chime in late, just noticed this series now as I wasn't in Cc for
  the core BPF changes. More below ... ]

This patchset allows user read PMU events in the following way:
  1. Open the PMU using perf_event_open() (for each CPUs or for
     each processes he/she'd like to watch);
  2. Create a BPF_MAP_TYPE_PERF_EVENT_ARRAY BPF map;
  3. Insert FDs into the map with some key-value mapping scheme
     (i.e. cpuid -> event on that CPU);
  4. Load and attach eBPF programs as usual;
  5. In eBPF program, get the perf_event_map_fd and key (i.e.
     cpuid get from bpf_get_smp_processor_id()) then use
     bpf_perf_event_read() to read from it.
  6. Do anything he/her want.

changes in V2:
  - put atomic_long_inc_not_zero() between fdget() and fdput();
  - limit the event type to PERF_TYPE_RAW and PERF_TYPE_HARDWARE;
  - Only read the event counter on current CPU or on current
    process;
  - add new map type BPF_MAP_TYPE_PERF_EVENT_ARRAY to store the
    pointer to the struct perf_event;
  - according to the perf_event_map_fd and key, the function
    bpf_perf_event_read() can get the Hardware PMU counter value;

Patch 5/5 is a simple example and shows how to use this new eBPF
programs ability. The PMU counter data can be found in
/sys/kernel/debug/tracing/trace(trace_pipe).(the cycles PMU
value when 'kprobe/sys_write' sampling)

   $ cat /sys/kernel/debug/tracing/trace_pipe
   $ ./tracex6
        ...
              cat-677   [002] d..1   210.299270: : bpf count: CPU-2  5316659
              cat-677   [002] d..1   210.299316: : bpf count: CPU-2  5378639
              cat-677   [002] d..1   210.299362: : bpf count: CPU-2  5440654
              cat-677   [002] d..1   210.299408: : bpf count: CPU-2  5503211
              cat-677   [002] d..1   210.299454: : bpf count: CPU-2  5565438
              cat-677   [002] d..1   210.299500: : bpf count: CPU-2  5627433
              cat-677   [002] d..1   210.299547: : bpf count: CPU-2  5690033
              cat-677   [002] d..1   210.299593: : bpf count: CPU-2  5752184
              cat-677   [002] d..1   210.299639: : bpf count: CPU-2  5814543
            <...>-548   [009] d..1   210.299667: : bpf count: CPU-9  605418074
            <...>-548   [009] d..1   210.299692: : bpf count: CPU-9  605452692
              cat-677   [002] d..1   210.299700: : bpf count: CPU-2  5896319
            <...>-548   [009] d..1   210.299710: : bpf count: CPU-9  605477824
            <...>-548   [009] d..1   210.299728: : bpf count: CPU-9  605501726
            <...>-548   [009] d..1   210.299745: : bpf count: CPU-9  605525279
            <...>-548   [009] d..1   210.299762: : bpf count: CPU-9  605547817
            <...>-548   [009] d..1   210.299778: : bpf count: CPU-9  605570433
            <...>-548   [009] d..1   210.299795: : bpf count: CPU-9  605592743
        ...

The detail of patches is as follow:

Patch 1/5 introduces a new bpf map type. This map only stores the
pointer to struct perf_event;

Patch 2/5 introduces a map_traverse_elem() function for further use;

Patch 3/5 convets event file descriptors into perf_event structure when
add new element to the map;

So far all the map backends are of generic nature, knowing absolutely nothing
about a particular consumer/subsystem of eBPF (tc, socket filters, etc). The
tail call is a bit special, but nevertheless generic for each user and [very]
useful, so it makes sense to inherit from the array map and move the code there.

I don't really like that we start add new _special_-cased maps here into the
eBPF core code, it seems quite hacky. :( From your rather terse commit 
description
where you introduce the maps, I failed to see a detailed elaboration on this 
i.e.
why it cannot be abstracted any different?

Patch 4/5 implement function bpf_perf_event_read() that get the selected
hardware PMU conuter;

Patch 5/5 give a simple example.

Kaixu Xia (5):
   bpf: Add new bpf map type to store the pointer to struct perf_event
   bpf: Add function map->ops->map_traverse_elem() to traverse map elems
   bpf: Save the pointer to struct perf_event to map
   bpf: Implement function bpf_perf_event_read() that get the selected
     hardware PMU conuter
   samples/bpf: example of get selected PMU counter value

  include/linux/bpf.h        |   6 +++
  include/linux/perf_event.h |   5 ++-
  include/uapi/linux/bpf.h   |   3 ++
  kernel/bpf/arraymap.c      | 110 +++++++++++++++++++++++++++++++++++++++++++++
  kernel/bpf/helpers.c       |  42 +++++++++++++++++
  kernel/bpf/syscall.c       |  26 +++++++++++
  kernel/events/core.c       |  30 ++++++++++++-
  kernel/trace/bpf_trace.c   |   2 +
  samples/bpf/Makefile       |   4 ++
  samples/bpf/bpf_helpers.h  |   2 +
  samples/bpf/tracex6_kern.c |  27 +++++++++++
  samples/bpf/tracex6_user.c |  67 +++++++++++++++++++++++++++
  12 files changed, 321 insertions(+), 3 deletions(-)
  create mode 100644 samples/bpf/tracex6_kern.c
  create mode 100644 samples/bpf/tracex6_user.c


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to