On Fri, Dec 29, 2017 at 3:44 AM, Jason Wang <jasow...@redhat.com> wrote: > This patch allows userspace to attach eBPF filter to tun. This will > allow to implement VM dataplane filtering in a more efficient way > compared to cBPF filter.
Is the idea to allow the trusted hypervisor to install these programs, or the untrusted guests? eBPF privilege escalations like those recently described in https://lwn.net/Articles/742170/ would give me pause to expose this to guests. > Signed-off-by: Jason Wang <jasow...@redhat.com> > --- > drivers/net/tun.c | 26 ++++++++++++++++++++++++++ > include/uapi/linux/if_tun.h | 1 + > 2 files changed, 27 insertions(+) > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > index 0853829..6e9452b 100644 > --- a/drivers/net/tun.c > +++ b/drivers/net/tun.c > @@ -238,6 +238,7 @@ struct tun_struct { > struct tun_pcpu_stats __percpu *pcpu_stats; > struct bpf_prog __rcu *xdp_prog; > struct tun_prog __rcu *steering_prog; > + struct tun_prog __rcu *filter_prog; > }; > > static int tun_napi_receive(struct napi_struct *napi, int budget) > @@ -984,12 +985,25 @@ static void tun_automq_xmit(struct tun_struct *tun, > struct sk_buff *skb) > #endif > } > > +static unsigned int run_ebpf_filter(struct tun_struct *tun, > + struct sk_buff *skb, > + int len) > +{ > + struct tun_prog *prog = rcu_dereference(tun->filter_prog); > + > + if (prog) > + len = bpf_prog_run_clear_cb(prog->prog, skb); > + > + return len; > +} > + > /* Net device start xmit */ > static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev) > { > struct tun_struct *tun = netdev_priv(dev); > int txq = skb->queue_mapping; > struct tun_file *tfile; > + int len = skb->len; > > rcu_read_lock(); > tfile = rcu_dereference(tun->tfiles[txq]); > @@ -1015,9 +1029,16 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, > struct net_device *dev) > sk_filter(tfile->socket.sk, skb)) > goto drop; > > + len = run_ebpf_filter(tun, skb, len); > + if (!len) > + goto drop; > + This adds a second filter step independent of the sk_filter call above. Perhaps the two filter interfaces can map onto to the same instance. I imagine that qemu never programs SO_ATTACH_FILTER. More importantly, should this program just return a boolean pass or drop. Taking a length and trimming may introduce bugs later on if the stack parses the packet unconditionally, expecting a minimum size to be present. This was the reason for introducing sk_filter_trim_cap and using that in other sk_filter sites. A quick scan shows that tun_put_user expects a full vlan tag to exist if skb_vlan_tag_present(skb), for instance. If trimmed to below this length the final call to skb_copy_datagram_iter may have negative length. This is an issue with the existing sk_filter call as much as with the new run_ebpf_filter call. > if (unlikely(skb_orphan_frags_rx(skb, GFP_ATOMIC))) > goto drop; > > + if (pskb_trim(skb, len)) > + goto drop; > + > skb_tx_timestamp(skb);