On 1 November 2016 at 09:17, Tom Herbert <t...@herbertland.com> wrote: > On Mon, Oct 31, 2016 at 5:37 PM, Thomas Graf <tg...@suug.ch> wrote: >> {Open question: >> Tom brought up the question on whether it is safe to modify the packet >> in artbirary ways before dst_output(). This is the equivalent to a raw >> socket injecting illegal headers. This v2 currently assumes that >> dst_output() is ready to accept invalid header values. This needs to be >> verified and if not the case, then raw sockets or dst_output() handlers >> must be fixed as well. Another option is to mark lwtunnel_output() as >> read-only for now.} >> > The question might not be so much about illegal headers but whether > fields in the skbuff related to the packet contents are kept correct. > We have protocol, header offsets, offsets for inner protocols also, > encapsulation settings, checksum status, checksum offset, checksum
The headers cannot be extended or reduced so the offsets always remain correct. What can happen is that the header contains invalid data. > complete value, vlan information. Any or all of which I believe could > be turned into being incorrect if we allow the packet to be > arbitrarily modified by BPF. This problem is different than raw > sockets because LWT operates in the middle of the stack, the skbuff > has already been set up which such things. You keep saying this "middle in the stack" but the point is exactly the same as a raw socket with IPPROTO_RAW and hdrincl, see rawv6_sendmsg() and rawv6_send_hdrincl(). An IPv6 raw socket can feed arbitrary garbage into dst_output(). IPv4 does some minimal sanity checks. If this is a concern I'm fine with making the dst_output path read-only for now. >> This series implements BPF program invocation from dst entries via the >> lightweight tunnels infrastructure. The BPF program can be attached to >> lwtunnel_input(), lwtunnel_output() or lwtunnel_xmit() and sees an L3 >> skb as context. input is read-only, output can write, xmit can write, >> push headers, and redirect. >> >> Motiviation for this work: >> - Restricting outgoing routes beyond what the route tuple supports >> - Per route accounting byond realms >> - Fast attachment of L2 headers where header does not require resolving >> L2 addresses >> - ILA like uses cases where L3 addresses are resolved and then routed >> in an async manner >> - Fast encapsulation + redirect. For now limited to use cases where not >> setting inner and outer offset/protocol is OK. >> > Is checksum offload supported? By default, at least for Linux, we > offload the outer UDP checksum in VXLAN and the other UDP > encapsulations for performance. No. UDP encap is done by setting a tunnel key through a helper and letting the encapsulation device handle this. I don't currently see a point in replicating all of that logic.