This patch introduce initial control flow infrastructure to eBPF verifier.
Various control flow information are built in an incremental way, so they
could be easily maintained at various level.

After this patch set, information collection on eBPF insns are centred at
check_subprogs, check_cfg and related code then are replaced by new
infrastructure.

Checks/Analysis offered by check_cfg are still offered by new
infrastructure. For example loop detection, recursive function call,
unreachable insn, prune heuristic marking.

This infrastructure comes with some memory usage and execution time cost.
Memory pool based allocation is used to avoid frequently calling of
kmalloc/free. Singly linked list and compact pointer are used to reduce
various cfg node size.

data structure size
===
  basic blocks     edge        call-graph edge
  16-bytes         24-bytes    6-bytes

memory usage (test_l4lb_noinline, test_xdp_noinline)
====
(counted all subprogs):

test_l4lb_noinline:

  cfg info:  597 insns, 12 subprogs
             107 basic-blocks, 278 edges, 12 call edges.

  mem usage: 9216(9K)-bytes for memory pool (basic-blocks/edges/call-edges)
             644-bytes for dominator trees.

test_xdp_noinline:

  cfg info:  1029 insns, 21 subprogs
             191 basic-basics, 502 edges, 23 call edges.

  mem usage: 15360(15K)-bytes for memory pool.
             1192-bytes for dominator trees.

The existing check_cfg is using two auxiliary arrays (insn_state and
insn_stack), insn_cnt * sizeof(int) bytes for each, so would use ~5K and
~8K accordingly.  (looks to me insn_state could use 1-byte element,
insn_stack could use 2-byte, so probably could reduced to ~2K and ~3K).

The existing check_cfg is using 8-bytes for each insn during cfg check.
The new infrastructure basically use 2x mems, 16-bytes for each insn.

execution time
===
test_l4lb_noinline:
  existing check_subprog/check_cfg: ~55000 ns
  new infrastructure: ~135000 ns

test_xdp_noinline:
  existing check_subprog/check_cfg: ~52000 ns
  new infrastructure: ~120000 ns

The control flow infrastructure could possibly lay the ground for further
static analysis inside eBPF verifier, for example bounded loop detection,
path-sensitive data-flow analysis etc.

Jiong Wang (10):
  bpf: cfg: partition basic blocks for each subprog
  bpf: cfg: add edges between basic blocks to form CFG
  bpf: cfg: build domination tree using Tarjan algorithm
  bpf: cfg: detect loop use domination information
  bpf: cfg: detect unreachable basic blocks
  bpf: cfg: move find_subprog/add_subprog to cfg.c
  bpf: cfg: build call graph and detect unreachable/recursive call
  bpf: cfg: remove push_insn and check_cfg
  bpf: cfg: reduce k*alloc/free call by using memory pool for allocating
    nodes
  bpf: cfg: reduce memory usage by using singly list + compat pointer

 include/linux/bpf_verifier.h |   8 +
 kernel/bpf/Makefile          |   2 +-
 kernel/bpf/cfg.c             | 956 +++++++++++++++++++++++++++++++++++++++++++
 kernel/bpf/cfg.h             |  42 ++
 kernel/bpf/verifier.c        | 386 ++++++-----------
 5 files changed, 1131 insertions(+), 263 deletions(-)
 create mode 100644 kernel/bpf/cfg.c
 create mode 100644 kernel/bpf/cfg.h

-- 
2.7.4

Reply via email to