On Wed, Sep 12, 2018 at 6:25 PM Alexei Starovoitov <alexei.starovoi...@gmail.com> wrote: > > On Wed, Sep 12, 2018 at 02:43:37PM -0400, Willem de Bruijn wrote: > > On Tue, Sep 11, 2018 at 11:47 PM Alexei Starovoitov > > <alexei.starovoi...@gmail.com> wrote: > > > > > > On Fri, Sep 07, 2018 at 05:11:08PM -0700, Petar Penkov wrote: > > > > From: Petar Penkov <ppen...@google.com> > > > > > > > > Adds a hook for programs of type BPF_PROG_TYPE_FLOW_DISSECTOR and > > > > attach type BPF_FLOW_DISSECTOR that is executed in the flow dissector > > > > path. The BPF program is per-network namespace. > > > > > > > > Signed-off-by: Petar Penkov <ppen...@google.com> > > > > Signed-off-by: Willem de Bruijn <will...@google.com> > > > > --- > > > > include/linux/bpf.h | 1 + > > > > include/linux/bpf_types.h | 1 + > > > > include/linux/skbuff.h | 7 ++ > > > > include/net/net_namespace.h | 3 + > > > > include/net/sch_generic.h | 12 ++- > > > > include/uapi/linux/bpf.h | 25 ++++++ > > > > kernel/bpf/syscall.c | 8 ++ > > > > kernel/bpf/verifier.c | 32 ++++++++ > > > > net/core/filter.c | 67 ++++++++++++++++ > > > > net/core/flow_dissector.c | 136 +++++++++++++++++++++++++++++++++ > > > > tools/bpf/bpftool/prog.c | 1 + > > > > tools/include/uapi/linux/bpf.h | 25 ++++++ > > > > tools/lib/bpf/libbpf.c | 2 + > > > > > > please split up update to tools/include/uapi/linux/bpf.h as a separate > > > patch 2. > > > We often have conflicts in there, so best to have a separate. > > > Also please split tools/lib and tools/bpf chnages into patch 3. > > > > Will do in v3. > > > > > > struct qdisc_skb_cb { > > > > - unsigned int pkt_len; > > > > - u16 slave_dev_queue_mapping; > > > > - u16 tc_classid; > > > > + union { > > > > + struct { > > > > + unsigned int pkt_len; > > > > + u16 slave_dev_queue_mapping; > > > > + u16 tc_classid; > > > > + }; > > > > + struct bpf_flow_keys *flow_keys; > > > > + }; > > > > > > is this magic really necessary? flow_dissector runs very early in recv > > > path. > > > There is no qdisc or conflicts with tcp/ip use of cb. > > > I think the whole cb block can be used. > > > > The flow dissector also runs in the context of TC, from flower. > > But that is not a reason to use this struct. > > > > We need both (a) data shared with the BPF application and between > > applications using tail-calls, to pass along the parse offset (nhoff), > > and (b) data not accessible by the program, to store the flow_keys > > pointer. > > > > qdisc_skb_cb already has this split, exposing only the 20B .data > > field to the application. Flow dissection currently reuses the existing > > bpf_convert_ctx_access logic for this field. > > > > We could create a separate flowdissect_skb_cb struct with the > > same split setup, but a second constraint is that relevant internal > > BPF interfaces already expect qdisc_skb_cb, notably > > bkf_skb_data_end. So the union was easier. > > got it. all makes sense. > > > > > There is another way to pass nhoff besides cb[] (see below). But > > I don't immediately see another place to store the flow_keys ptr. > > > > At least, if we pass skb as context. One larger change would > > be to introduce another ctx struct, similar to sk_reuseport_(kern|md). > > yeah. thought about this too, but your approach looks easier and faster. > Accesses to skb have one less dereference. > > > > > @@ -2333,6 +2335,7 @@ struct __sk_buff { > > > > /* ... here. */ > > > > > > > > __u32 data_meta; > > > > + __u32 flow_keys; > > > > > > please use > > > struct bpf_flow_keys *flow_keys; > > > instead. > > > > > > See what we did in 'struct sk_msg_md' and in 'struct sk_reuseport_md'. > > > There is no need to hide pointers in u32. > > > > > > > Will do in v3. > > > > > > @@ -658,6 +754,46 @@ bool __skb_flow_dissect(const struct sk_buff *skb, > > > > FLOW_DISSECTOR_KEY_BASIC, > > > > target_container); > > > > > > > > + rcu_read_lock(); > > > > + attached = skb ? > > > > rcu_dereference(dev_net(skb->dev)->flow_dissector_prog) > > > > + : NULL; > > > > + if (attached) { > > > > + /* Note that even though the const qualifier is discarded > > > > + * throughout the execution of the BPF program, all > > > > changes(the > > > > + * control block) are reverted after the BPF program > > > > returns. > > > > + * Therefore, __skb_flow_dissect does not alter the skb. > > > > + */ > > > > + struct bpf_flow_keys flow_keys = {}; > > > > + struct qdisc_skb_cb cb_saved; > > > > + struct qdisc_skb_cb *cb; > > > > + u16 *pseudo_cb; > > > > + u32 result; > > > > + > > > > + cb = qdisc_skb_cb(skb); > > > > + pseudo_cb = (u16 *)bpf_skb_cb((struct sk_buff *)skb); > > > > + > > > > + /* Save Control Block */ > > > > + memcpy(&cb_saved, cb, sizeof(cb_saved)); > > > > + memset(cb, 0, sizeof(cb_saved)); > > > > + > > > > + /* Pass parameters to the BPF program */ > > > > + cb->flow_keys = &flow_keys; > > > > + *pseudo_cb = nhoff; > > > > > > I don't understand this bit. > > > What is this pseudo_cb and why nhoff goes in there? > > > Some odd way to pass it into the prog? > > > > Yes. nhoff passes the offset to the program to start parsing from. > > Applications also pass this during tail calls. > > > > Alternatively we can just add a new field to struct bpf_flow_keys. > > I think that certainly will be cleaner and easier to use from > bpf prog pov. Since flow_keys stay constant any change to nhoff > between tail_calls will be preserved too. I see no cons to such approach.
Yes, it's definitely simpler. We'll do that. Thanks!