Hi,

I investigated the feasibility of optimizing `fetcharg` in probe events
using BPF conversion. The result looks promising. It can reduce about
30% of overhead (and maybe more if we have more than 3 arguments.)

I actually thought there was not such a big difference because I guessed
major overhead source is unsafe pointer dereferencing (e.g.
copy_from_kernel_nofault()). Actually without CONFIG_BPF_JIT, the overhead
is more than double. But with the JIT compiler it showed better performance.

The basic concept is quite simple. The process remains the same up until
the point where user input is converted into `fetcharg` code. It is
possible to convert some of the fundamental `fetcharg` operations into
an equivalent sequence of BPF instructions. This creates a single
`bpf_prog` for each probe event (rather than one per argument).
This program executes within the event handler, reads `pt_regs` directly,
and stores the results in the ftrace ring buffer, just as `fetcharg`
does.

So here are the benchmark results on qemu (KVM) on Intel Core i7-8565U.

When enabling BPF with JIT:
--------------------------------------------------------------------------------
Configuration      0 Fetchargs       1 Fetcharg        2 Fetchargs        3 
Fetchargs
--------------------------------------------------------------------------------
Baseline                 298882359               -                  -           
       - loops/sec
                                 -               -                  -           
       - overhead
Kprobe                     9740841         8664195            7944956           
 7608274 loops/sec
                          99.31 ns        12.76 ns           23.21 ns           
28.78 ns overhead
Fprobe                    10827749         9220918            7992512           
 7683757 loops/sec
                          89.01 ns        16.09 ns           32.76 ns           
37.79 ns overhead
Eprobe                     6746389         6245994            5319037           
 4845406 loops/sec
                         144.88 ns        11.88 ns           39.78 ns           
58.15 ns overhead
--------------------------------------------------------------------------------

When enabling BPF without JIT:
-----------------------------------------------------------------------------------------------
Configuration      0 Fetchargs       1 Fetcharg        2 Fetchargs        3 
Fetchargs
-----------------------------------------------------------------------------------------------
Baseline                  84067374               -                  -           
       - loops/sec
                                 -               -                  -           
       - overhead
Kprobe                     7092949         5834913            3848776           
 3443408 loops/sec
                         129.09 ns        30.40 ns          118.84 ns          
149.42 ns overhead
Fprobe                     9426302         6441734            4350313           
 3710814 loops/sec
                          94.19 ns        49.15 ns          123.78 ns          
163.40 ns overhead
Eprobe                     5681716         4958113            3940999           
 3953434 loops/sec
                         164.11 ns        25.69 ns           77.74 ns           
76.94 ns overhead
-----------------------------------------------------------------------------------------------

When disabling BPF (legacy fetcharg)
--------------------------------------------------------------------------------
Configuration      0 Fetchargs       1 Fetcharg        2 Fetchargs        3 
Fetchargs
--------------------------------------------------------------------------------
Baseline                 245433525               -                  -           
       - loops/sec
                                 -               -                  -           
       - overhead
Kprobe                     9055348         8488351            7219595           
 6453928 loops/sec
                         106.36 ns         7.38 ns           28.08 ns           
44.51 ns overhead
Fprobe                    10859326         9288801            7492518           
 6607046 loops/sec
                          88.01 ns        15.57 ns           41.38 ns           
59.27 ns overhead
Eprobe                     6987128         5114526            5055084           
 4803759 loops/sec
                         139.05 ns        52.40 ns           54.70 ns           
65.05 ns overhead
--------------------------------------------------------------------------------

The number is still unstable (because of the benchmark problem) but the
trend shows the BPF+JIT is the winner. 

TODOs:
 - Add a new Kconfig which depends on CONFIG_BPF_JIT=y.
 - Even if a single dereference operation fails, processing of subsequent
   arguments continues.
 - Allow mixing with unsupported FETCH_OPs on the same event.

Thank you,

---
base-commit: c0c56fe6fb52cfb28419242cfa6235125f818f94

Masami Hiramatsu (Google) (4):
      tools/tracing: Add fetcharg performance micro-benchmark
      tracing/probes: Compile all fetchargs into a single BPF program per event
      tracing: Add disable_bpf trace option to ignore eBPF for fetchargs
      selftests/ftrace: Add a test for eBPF compiled fetchargs


 kernel/trace/trace.c                               |    7 +
 kernel/trace/trace.h                               |    8 +
 kernel/trace/trace_probe.c                         |  249 ++++++++++++++++++++
 kernel/trace/trace_probe.h                         |   15 +
 kernel/trace/trace_probe_tmpl.h                    |   13 +
 .../ftrace/test.d/dynevent/test_bpf_fetchargs.tc   |   51 ++++
 tools/tracing/benchmark/Kbuild                     |    3 
 tools/tracing/benchmark/Makefile                   |   12 +
 tools/tracing/benchmark/bench_fetcharg.sh          |  195 ++++++++++++++++
 tools/tracing/benchmark/fetcharg_bench.c           |   98 ++++++++
 tools/tracing/benchmark/fetcharg_bench_trace.h     |   37 +++
 11 files changed, 684 insertions(+), 4 deletions(-)
 create mode 100644 
tools/testing/selftests/ftrace/test.d/dynevent/test_bpf_fetchargs.tc
 create mode 100644 tools/tracing/benchmark/Kbuild
 create mode 100644 tools/tracing/benchmark/Makefile
 create mode 100755 tools/tracing/benchmark/bench_fetcharg.sh
 create mode 100644 tools/tracing/benchmark/fetcharg_bench.c
 create mode 100644 tools/tracing/benchmark/fetcharg_bench_trace.h

--
Masami Hiramatsu (Google) <[email protected]>

Reply via email to