On Wed, Aug 17, 2016 at 11:20:29AM -0700, Alexei Starovoitov wrote: > On Wed, Aug 17, 2016 at 04:00:47PM +0200, Daniel Mack wrote: > > If CONFIG_CGROUP_BPF is enabled, and the cgroup associated with the > > receiving socket has an eBPF programs installed, run them from > > sk_filter_trim_cap(). > > > > eBPF programs used in this context are expected to either return 1 to > > let the packet pass, or != 1 to drop them. The programs have access to > > the full skb, including the MAC headers. > > > > This patch only implements the call site for ingress packets. > > > > Signed-off-by: Daniel Mack <dan...@zonque.org> > > --- > > net/core/filter.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 44 insertions(+) > > > > diff --git a/net/core/filter.c b/net/core/filter.c > > index c5d8332..a1dd94b 100644 > > --- a/net/core/filter.c > > +++ b/net/core/filter.c > > @@ -52,6 +52,44 @@ > > #include <net/dst.h> > > #include <net/sock_reuseport.h> > > > > +#ifdef CONFIG_CGROUP_BPF > > +static int sk_filter_cgroup_bpf(struct sock *sk, struct sk_buff *skb, > > + enum bpf_attach_type type) > > +{ > > + struct sock_cgroup_data *skcd = &sk->sk_cgrp_data; > > + struct cgroup *cgrp = sock_cgroup_ptr(skcd); > > + struct bpf_prog *prog; > > + int ret = 0; > > + > > + rcu_read_lock(); > > + > > + switch (type) { > > + case BPF_ATTACH_TYPE_CGROUP_EGRESS: > > + prog = rcu_dereference(cgrp->bpf_egress); > > + break; > > + case BPF_ATTACH_TYPE_CGROUP_INGRESS: > > + prog = rcu_dereference(cgrp->bpf_ingress); > > + break; > > + default: > > + WARN_ON_ONCE(1); > > + ret = -EINVAL; > > + break; > > + } > > + > > + if (prog) { > > I really like how in this version of the patches it became > a single load+cmp of per-packet cost when this feature is off. > Please move > + struct cgroup *cgrp = sock_cgroup_ptr(skcd); > into if (prog) {..} > to make sure it's actually single load. > The compiler cannot avoid that load when it's placed at the top.
sorry. brain fart. it is two loads. scratch that. > > + unsigned int offset = skb->data - skb_mac_header(skb); > > + > > + __skb_push(skb, offset); > > + ret = bpf_prog_run_clear_cb(prog, skb) > 0 ? 0 : -EPERM; > > that doesn't match commit log. The above '> 0' makes sense to me though. > If we want to do it for 1 only we have to define it in uapi/bpf.h > as action code, so we can extend to 2, 3 in the future if necessary. > > It also have to be bpf_prog_run_save_cb() (as sk_filter_trim_cap() does) > instead of bpf_prog_run_clear_cb(). > See commit ff936a04e5f2 ("bpf: fix cb access in socket filter programs") > > > + __skb_pull(skb, offset); > > + } > > + > > + rcu_read_unlock(); > > + > > + return ret; > > +} > > +#endif /* !CONFIG_CGROUP_BPF */ > > + > > /** > > * sk_filter_trim_cap - run a packet through a socket filter > > * @sk: sock associated with &sk_buff > > @@ -78,6 +116,12 @@ int sk_filter_trim_cap(struct sock *sk, struct sk_buff > > *skb, unsigned int cap) > > if (skb_pfmemalloc(skb) && !sock_flag(sk, SOCK_MEMALLOC)) > > return -ENOMEM; > > > > +#ifdef CONFIG_CGROUP_BPF > > + err = sk_filter_cgroup_bpf(sk, skb, BPF_ATTACH_TYPE_CGROUP_INGRESS); > > + if (err) > > + return err; > > +#endif > > + > > err = security_sock_rcv_skb(sk, skb); > > if (err) > > return err; > > -- > > 2.5.5 > >