On Sun, Jul 10, 2016 at 12:09 PM, Brenden Blanco <bbla...@plumgrid.com> wrote: > On Sun, Jul 10, 2016 at 03:37:31PM +0200, Jesper Dangaard Brouer wrote: >> On Sat, 9 Jul 2016 08:47:52 -0500 >> Tom Herbert <t...@herbertland.com> wrote: >> >> > On Sat, Jul 9, 2016 at 3:14 AM, Jesper Dangaard Brouer >> > <bro...@redhat.com> wrote: >> > > On Thu, 7 Jul 2016 19:15:13 -0700 >> > > Brenden Blanco <bbla...@plumgrid.com> wrote: >> > > >> > >> Add a new bpf prog type that is intended to run in early stages of the >> > >> packet rx path. Only minimal packet metadata will be available, hence a >> > >> new context type, struct xdp_md, is exposed to userspace. So far only >> > >> expose the packet start and end pointers, and only in read mode. >> > >> >> > >> An XDP program must return one of the well known enum values, all other >> > >> return codes are reserved for future use. Unfortunately, this >> > >> restriction is hard to enforce at verification time, so take the >> > >> approach of warning at runtime when such programs are encountered. The >> > >> driver can choose to implement unknown return codes however it wants, >> > >> but must invoke the warning helper with the action value. >> > > >> > > I believe we should define a stronger semantics for unknown/future >> > > return codes than the once stated above: >> > > "driver can choose to implement unknown return codes however it wants" >> > > >> > > The mlx4 driver implementation in: >> > > [PATCH v6 04/12] net/mlx4_en: add support for fast rx drop bpf program >> > > >> > > On Thu, 7 Jul 2016 19:15:16 -0700 Brenden Blanco <bbla...@plumgrid.com> >> > > wrote: >> > > >> > >> + /* A bpf program gets first chance to drop the packet. It >> > >> may >> > >> + * read bytes but not past the end of the frag. >> > >> + */ >> > >> + if (prog) { >> > >> + struct xdp_buff xdp; >> > >> + dma_addr_t dma; >> > >> + u32 act; >> > >> + >> > >> + dma = be64_to_cpu(rx_desc->data[0].addr); >> > >> + dma_sync_single_for_cpu(priv->ddev, dma, >> > >> + >> > >> priv->frag_info[0].frag_size, >> > >> + DMA_FROM_DEVICE); >> > >> + >> > >> + xdp.data = page_address(frags[0].page) + >> > >> + >> > >> frags[0].page_offset; >> > >> + xdp.data_end = xdp.data + length; >> > >> + >> > >> + act = bpf_prog_run_xdp(prog, &xdp); >> > >> + switch (act) { >> > >> + case XDP_PASS: >> > >> + break; >> > >> + default: >> > >> + bpf_warn_invalid_xdp_action(act); >> > >> + case XDP_DROP: >> > >> + goto next; >> > >> + } >> > >> + } >> > > >> > > Thus, mlx4 choice is to drop packets for unknown/future return codes. >> > > >> > > I think this is the wrong choice. I think the choice should be >> > > XDP_PASS, to pass the packet up the stack. >> > > >> > > I find "XDP_DROP" problematic because it happen so early in the driver, >> > > that we lost all possibilities to debug what packets gets dropped. We >> > > get a single kernel log warning, but we cannot inspect the packets any >> > > longer. By defaulting to XDP_PASS all the normal stack tools (e.g. >> > > tcpdump) is available. > The goal of XDP is performance, and therefore the driver-specific choice > I am making here is to drop, because it preserves resources to do so. I > cannot say for all drivers whether this is the right choice or not. > Therefore, in the user-facing API I leave it undefined, so that future > drivers can have flexibility to choose the most performant > implementation for themselves. > I don't think this should be undefined. If a driver receives a code from XDP that it doesn't understand this is an API mismatch and a bug somewhere.
> Consider the UDP DDoS use case that we have mentioned before. Suppose an > attacker has knowledge of the particular XDP program that is being used > to filter their packets, and can somehow overflow the return code of the > program. The attacker would prefer that the overflow case consumes > time/memory/both, which if the mlx4 driver were to pass to stack it > would certainly do, and so we must choose the opposite if we have > network security in mind (we do!). Yes. >> > > >> > >> > It's an API issue though not a problem with the packet. Allowing >> > unknown return codes to pass seems like a major security problem also. >> >> We have the full power and flexibility of the normal Linux stack to >> drop these packets. And from a usability perspective it gives insight >> into what is wrong and counters metrics. Would you rather blindly drop >> e.g. 0.01% of the packets in your data-centers without knowing. > Full power, but not full speed, and in the case of DDoS mitigation this > is a strong enough argument IMHO. >> >> We already talk about XDP as an offload mechanism. Normally when >> loading a (XDP) "offload" program it should be rejected, e.g. by the >> validator. BUT we cannot validate all return eBPF codes, because they >> can originate from a table lookup. Thus, we _do_ allow programs to be >> loaded, with future unknown return code. >> This then corresponds to only part of the program can be offloaded, >> thus the natural response is to fallback, handling this is the >> non-offloaded slower-path. >> >> I see the XDP_PASS fallback as a natural way of supporting loading >> newer/future programs on older "versions" of XDP. >> E.g. I can have a XDP program that have a valid filter protection >> mechanism, but also use a newer mechanism, and my server fleet contains >> different NIC vendors, some NICs only support the filter part. Then I >> want to avoid having to compile and maintain different XDP/eBPF >> programs per NIC vendor. (Instead I prefer having a Linux stack >> fallback mechanism, and transparently XDP offload as much as the NIC >> driver supports). > I would then argue to only support offloading of XDP programs with > verifiable return codes. We're not at that stage yet, and I think we can > choose different defaults for these two cases. > > We have conflicting examples here, which lead to different conclusions. > Reiterating an earlier argument that I made for others on the list to > consider: > """ > Besides, I don't see how PASS is any more correct than DROP. Consider a > future program that is intended to rewrite a packet and forward it out > another port (with some TX_OTHER return code or whatever). If the driver > PASSes the packet, it will still not be interpreted by the stack, since > it may have been destined for some other machine. > """ > So, IMHO there is not a clear right or wrong, and I still fall back to > the security argument to resolve the dilemma. The point there is not > drop/pass, but resource preservation. > Blind pass is a security risk, drop is always a correct action in that sense. Tom >> >> >> > > I can also imagine that, defaulting to XDP_PASS, can be an important >> > > feature in the future. >> > > >> > > In the future we will likely have features, where XDP can "offload" >> > > packet delivery from the normal stack (e.g. delivery into a VM). On a >> > > running production system you can then load your XDP program. If the >> > > driver was too old defaulting to XDP_DROP, then you lost your service, >> > > instead if defaulting to XDP_PASS, your service would survive, falling >> > > back to normal delivery. >> > > >> > > (For the VM delivery use-case, there will likely be a need for having a >> > > fallback delivery method in place, when the XDP program is not active, >> > > in-order to support VM migration). >> > > >> > > >> > > > [...]