From: Petar Penkov <ppen...@google.com> This patch series hardens the RX stack by allowing flow dissection in BPF, as previously discussed [1]. Because of the rigorous checks of the BPF verifier, this provides significant security guarantees. In particular, the BPF flow dissector cannot get inside of an infinite loop, as with CVE-2013-4348, because BPF programs are guaranteed to terminate. It cannot read outside of packet bounds, because all memory accesses are checked. Also, with BPF the administrator can decide which protocols to support, reducing potential attack surface. Rarely encountered protocols can be excluded from dissection and the program can be updated without kernel recompile or reboot if a bug is discovered.
Patch 1 adds infrastructure to execute a BPF program in __skb_flow_dissect. This includes a new BPF program and attach type. Patch 2 adds a flow dissector program in BPF. This parses most protocols in __skb_flow_dissect in BPF for a subset of flow keys (basic, control, ports, and address types). Patch 3 adds a selftest that attaches the BPF program to the flow dissector and sends traffic with different levels of encapsulation. This RFC patchset exposes a few design considerations: 1/ Because the flow dissector key definitions live in include/linux/net/flow_dissector.h, they are not visible from userspace, and the flow keys definitions need to be copied in the BPF program. 2/ An alternative to adding a new hook would have been to attach flow dissection programs at the XDP hook. Because this hook is executed before GRO, it would have to execute on every MSS, which would be more computationally expensive. Furthermore, the XDP hook is executed before an SKB has been allocated and there is no clear way to move the dissected keys into the SKB after it has been allocated. Eventually, perhaps a single pass can implement both GRO and flow dissection -- but napi_gro_cb shows that a lot more flow state would need to be parsed for this. 3/ The BPF program cannot use direct packet access everywhere because it uses an offset, initially supplied by the flow dissector. Because the initial value of this non-constant offset comes from outside of the program, the verifier does not know what its value is, and it cannot verify that it is within packet bounds. Therefore, direct packet access programs get rejected. 4/ Loading and attaching the BPF program requires capable(), as opposed to ns_capable(), because a malicious program might be able to return bad values that would trigger bugs in the kernel, such as the nhoff value bug fixed in commit 324f8305e59b ("net-backports: flow_dissector: properly cap thoff field"). [1] http://vger.kernel.org/netconf2017_files/rx_hardening_and_udp_gso.pdf Petar Penkov (3): flow_dissector: implements flow dissector BPF hook flow_dissector: implements eBPF parser selftests/bpf: test bpf flow dissection include/linux/bpf_types.h | 1 + include/linux/skbuff.h | 7 + include/net/flow_dissector.h | 16 + include/uapi/linux/bpf.h | 14 +- kernel/bpf/syscall.c | 8 + kernel/bpf/verifier.c | 2 + net/core/filter.c | 157 ++++ net/core/flow_dissector.c | 76 ++ tools/bpf/bpftool/prog.c | 1 + tools/include/uapi/linux/bpf.h | 5 +- tools/lib/bpf/libbpf.c | 2 + tools/testing/selftests/bpf/.gitignore | 2 + tools/testing/selftests/bpf/Makefile | 8 +- tools/testing/selftests/bpf/bpf_flow.c | 542 ++++++++++++ tools/testing/selftests/bpf/bpf_helpers.h | 3 + tools/testing/selftests/bpf/config | 1 + .../selftests/bpf/flow_dissector_load.c | 140 ++++ .../selftests/bpf/test_flow_dissector.c | 782 ++++++++++++++++++ .../selftests/bpf/test_flow_dissector.sh | 115 +++ tools/testing/selftests/bpf/with_addr.sh | 54 ++ tools/testing/selftests/bpf/with_tunnels.sh | 36 + 21 files changed, 1967 insertions(+), 5 deletions(-) create mode 100644 tools/testing/selftests/bpf/bpf_flow.c create mode 100644 tools/testing/selftests/bpf/flow_dissector_load.c create mode 100644 tools/testing/selftests/bpf/test_flow_dissector.c create mode 100755 tools/testing/selftests/bpf/test_flow_dissector.sh create mode 100755 tools/testing/selftests/bpf/with_addr.sh create mode 100755 tools/testing/selftests/bpf/with_tunnels.sh -- 2.18.0.865.gffc8e1a3cd6-goog