On Mon, Aug 20, 2018 at 1:52 PM, Alexei Starovoitov <alexei.starovoi...@gmail.com> wrote: > On Thu, Aug 16, 2018 at 09:44:20AM -0700, Petar Penkov wrote: >> From: Petar Penkov <ppen...@google.com> >> >> This patch series hardens the RX stack by allowing flow dissection in BPF, >> as previously discussed [1]. Because of the rigorous checks of the BPF >> verifier, this provides significant security guarantees. In particular, the >> BPF flow dissector cannot get inside of an infinite loop, as with >> CVE-2013-4348, because BPF programs are guaranteed to terminate. It cannot >> read outside of packet bounds, because all memory accesses are checked. >> Also, with BPF the administrator can decide which protocols to support, >> reducing potential attack surface. Rarely encountered protocols can be >> excluded from dissection and the program can be updated without kernel >> recompile or reboot if a bug is discovered. >> >> Patch 1 adds infrastructure to execute a BPF program in __skb_flow_dissect. >> This includes a new BPF program and attach type. >> >> Patch 2 adds a flow dissector program in BPF. This parses most protocols in >> __skb_flow_dissect in BPF for a subset of flow keys (basic, control, ports, >> and address types). >> >> Patch 3 adds a selftest that attaches the BPF program to the flow dissector >> and sends traffic with different levels of encapsulation. > > Overall I fully support the direction. Few things to consider: > >> This RFC patchset exposes a few design considerations: >> >> 1/ Because the flow dissector key definitions live in >> include/linux/net/flow_dissector.h, they are not visible from userspace, >> and the flow keys definitions need to be copied in the BPF program. > > I don't think copy-paste avoids the issue of uapi. > Anything used by BPF program is uapi. > The only exception is offsets of kernel internal structures > passed into bpf_probe_read(). > So we have several options: > 1. be honest and say 'struct flow_dissect_key*' is now uapi > 2. wrap all of them into 'struct bpf_flow_dissect_key*' and do rewrites > when/if 'struct flow_dissect_key*' changes > 3. wait for BTF to solve it for tracing use case and for this one two. > The idea is that kernel internal structs can be defined in bpf prog > and since they will be described precisely in BTF that comes with the prog > the kernel can validate that prog's BTF matches what kernel thinks it has. > imo that's the most flexible, but BTF for all of vmlinux won't be ready > tomorrow and looks like this patch set is ready to go, so I would go with 1 > or 2.
I will pursue #2 then as both you and David are in support of it. > >> 2/ An alternative to adding a new hook would have been to attach flow >> dissection programs at the XDP hook. Because this hook is executed before >> GRO, it would have to execute on every MSS, which would be more >> computationally expensive. Furthermore, the XDP hook is executed before an >> SKB has been allocated and there is no clear way to move the dissected keys >> into the SKB after it has been allocated. Eventually, perhaps a single pass >> can implement both GRO and flow dissection -- but napi_gro_cb shows that a >> lot more flow state would need to be parsed for this. > > global flow_dissect bpf hook semantics are problematic for testing. > like patch 3 test affects the whole system including all containers. > should the hook be per-netns or may be per netdevice? > Having the hook be per-netns would definitely be cleaner for testing. I will look into refactoring the hook from global to per-netns. >> 3/ The BPF program cannot use direct packet access everywhere because it >> uses an offset, initially supplied by the flow dissector. Because the >> initial value of this non-constant offset comes from outside of the >> program, the verifier does not know what its value is, and it cannot verify >> that it is within packet bounds. Therefore, direct packet access programs >> get rejected. > > this part doesn't seem to match the code. > direct packet access is allowed and usable even for fragmented skbs. > in such case only linear part of skb is in "direct access". I am not sure I understand. What I meant was that I use bpf_skb_load_bytes rather than direct packet access because the offset at which I read headers, nhoff, depends on an initial value that cannot be statically verified - namely what __skb_flow_dissect provides. Is there an alternative approach I should be taking here, and/or am I misunderstanding direct access? > > Last bit, I'm curious about, how the 'demo' flow dissector program > from patch 2 fairs vs in-kernel dissector when performance tested? > I used the test in patch 3 to compare the two implementations and the difference seemed to be small, with the BPF flow dissector being slightly slower. Thank you so much for your feedback, Alexei and David!