On Thu, Jan 31, 2019 at 5:47 PM Willem de Bruijn <willemdebruijn.ker...@gmail.com> wrote: > > On Thu, Jan 31, 2019 at 5:04 PM Daniel Borkmann <dan...@iogearbox.net> wrote: > > > > On 01/31/2019 12:51 AM, Peter Oskolkov wrote: > > > This patch implements BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap > > > BPF helper. It enables BPF programs (specifically, BPF_PROG_TYPE_LWT_IN > > > and BPF_PROG_TYPE_LWT_XMIT prog types) to add IP encapsulation headers > > > to packets (e.g. IP/GRE, GUE, IPIP). > > > > > > This is useful when thousands of different short-lived flows should be > > > encapped, each with different and dynamically determined destination. > > > Although lwtunnels can be used in some of these scenarios, the ability > > > to dynamically generate encap headers adds more flexibility, e.g. > > > when routing depends on the state of the host (reflected in global bpf > > > maps). > > > > > > Signed-off-by: Peter Oskolkov <p...@google.com> > > > > +int bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len, bool > > > ingress) > > > +{ > > > + struct iphdr *iph; > > > + bool ipv4; > > > + int err; > > > + > > > + if (unlikely(len < sizeof(struct iphdr) || len > > > > LWT_BPF_MAX_HEADROOM)) > > > + return -EINVAL; > > > + > > > + /* validate protocol and length */ > > > + iph = (struct iphdr *)hdr; > > > + if (iph->version == 4) { > > > + ipv4 = true; > > > + if (unlikely(len < iph->ihl * 4)) > > > + return -EINVAL; > > > + } else if (iph->version == 6) { > > > + ipv4 = false; > > > + if (unlikely(len < sizeof(struct ipv6hdr))) > > > + return -EINVAL; > > > + } else { > > > + return -EINVAL; > > > + } > > > + > > > + if (ingress) > > > + err = skb_cow_head(skb, len + skb->mac_len); > > > + else > > > + err = skb_cow_head(skb, > > > + len + > > > LL_RESERVED_SPACE(skb_dst(skb)->dev)); > > > + if (unlikely(err)) > > > + return err; > > > + > > > + /* push the encap headers and fix pointers */ > > > + skb_reset_inner_headers(skb); > > > + skb->encapsulation = 1; > > > + skb_push(skb, len); > > > + if (ingress) > > > + skb_postpush_rcsum(skb, iph, len); > > > + skb_reset_network_header(skb); > > > + memcpy(skb_network_header(skb), hdr, len); > > > + bpf_compute_data_pointers(skb); > > > > Does this work transparently with GSO as well or would we need to > > update shared info for this (like in nat64 case, for example)? > > Good point. It does need to update the gso_type to include the tunnel > type, similar to iptunnel_handle_offloads. > > Only, the nice feature of this interface is that it is encap protocol > independent, which implies that it does not know the correct type. > > I don't think that we want to allow programs to write the gso_type themselves. > > With GSO_PARTIAL, perhaps specifying the exact tunnel type can be > avoided as long as it is a fixed prefix to replicate? > > The transport layer size does not change, so no need to recompute gso_segs? > > Either way, this seems non-trivial enough to me to do in a separate > follow-on patch. For now just fail if skb_is_gso.
Thanks, Willem, for the details! I'll re-send the patchset with the additional check that skb_is_gso() is false.