On 7/18/25 8:05 AM, Matt Fleming wrote:
From: Matt Fleming <mflem...@cloudflare.com>

Add benchmarks for the standard set of operations: lookup, update,
delete. Also, include a benchmark for trie_free() which is known to have
terrible performance for maps with many entries.

Benchmarks operate on tries without gaps in the key range, i.e. each
test begins with a trie with valid keys in the range [0, nr_entries).
This is intended to cause maximum branching when traversing the trie.

All measurements are recorded inside the kernel to remove syscall
overhead.

Most benchmarks run an XDP program to generate stats but free needs to
collect latencies using fentry/fexit on map_free_deferred() because it's
not possible to use fentry directly on lpm_trie.c since commit
c83508da5620 ("bpf: Avoid deadlock caused by nested kprobe and fentry
bpf programs") and there's no way to create/destroy a map from within an
XDP program.

Here is example output from an AMD EPYC 9684X 96-Core machine for each
of the benchmarks using a trie with 10K entries and a 32-bit prefix
length, e.g.

   $ ./bench lpm-trie-$op \
        --prefix_len=32  \
        --producers=1     \
        --nr_entries=10000

   lookup: throughput    7.423 ± 0.023 M ops/s (  7.423M ops/prod), latency  
134.710 ns/op
   update: throughput    2.643 ± 0.015 M ops/s (  2.643M ops/prod), latency  
378.310 ns/op
   delete: throughput    0.712 ± 0.008 M ops/s (  0.712M ops/prod), latency 
1405.152 ns/op
     free: throughput    0.574 ± 0.003 K ops/s (  0.574K ops/prod), latency    
1.743 ms/op

Signed-off-by: Matt Fleming <mflem...@cloudflare.com>
---
  tools/testing/selftests/bpf/Makefile          |   2 +
  tools/testing/selftests/bpf/bench.c           |  10 +
  tools/testing/selftests/bpf/bench.h           |   1 +
  .../selftests/bpf/benchs/bench_lpm_trie_map.c | 345 ++++++++++++++++++
  .../selftests/bpf/progs/lpm_trie_bench.c      | 175 +++++++++
  .../selftests/bpf/progs/lpm_trie_map.c        |  19 +
  6 files changed, 552 insertions(+)
  create mode 100644 tools/testing/selftests/bpf/benchs/bench_lpm_trie_map.c
  create mode 100644 tools/testing/selftests/bpf/progs/lpm_trie_bench.c
  create mode 100644 tools/testing/selftests/bpf/progs/lpm_trie_map.c

[...]

+
+static __always_inline double duration_ms(struct bench_res *res)
+{
+       if (!res->hits)
+               return 0.0;
+
+       return res->duration_ns / res->hits / NSEC_PER_MSEC;
+}

The above function 'duration_ms' is not used.

+
+static void free_ops_report_progress(int iter, struct bench_res *res,
+                                    long delta_ns)
+{
+       double hits_per_sec, hits_per_prod;
+       double rate_divisor = 1000.0;
+       char rate = 'K';
+
+       hits_per_sec = res->hits / (res->duration_ns / (double)NSEC_PER_SEC) /
+                      rate_divisor;
+       hits_per_prod = hits_per_sec / env.producer_cnt;
+
+       printf("Iter %3d (%7.3lfus): ", iter,
+              (delta_ns - NSEC_PER_SEC) / 1000.0);
+       printf("hits %8.3lf%c/s (%7.3lf%c/prod)\n", hits_per_sec, rate,
+              hits_per_prod, rate);
+}
+

[...]


Reply via email to