On Tue, Feb 12, 2019 at 09:02:32AM -0800, Stanislav Fomichev wrote: > On 02/05, Stanislav Fomichev wrote: > > On 02/05, Alexei Starovoitov wrote: > > > On Tue, Feb 05, 2019 at 07:56:19PM -0800, Stanislav Fomichev wrote: > > > > On 02/05, Alexei Starovoitov wrote: > > > > > On Tue, Feb 05, 2019 at 04:59:31PM -0800, Stanislav Fomichev wrote: > > > > > > On 02/05, Alexei Starovoitov wrote: > > > > > > > On Tue, Feb 05, 2019 at 12:40:03PM -0800, Stanislav Fomichev > > > > > > > wrote: > > > > > > > > On 02/05, Willem de Bruijn wrote: > > > > > > > > > On Tue, Feb 5, 2019 at 12:57 PM Stanislav Fomichev > > > > > > > > > <s...@google.com> wrote: > > > > > > > > > > > > > > > > > > > > Currently, when eth_get_headlen calls flow dissector, it > > > > > > > > > > doesn't pass any > > > > > > > > > > skb. Because we use passed skb to lookup associated > > > > > > > > > > networking namespace > > > > > > > > > > to find whether we have a BPF program attached or not, we > > > > > > > > > > always use > > > > > > > > > > C-based flow dissector in this case. > > > > > > > > > > > > > > > > > > > > The goal of this patch series is to add new networking > > > > > > > > > > namespace argument > > > > > > > > > > to the eth_get_headlen and make BPF flow dissector programs > > > > > > > > > > be able to > > > > > > > > > > work in the skb-less case. > > > > > > > > > > > > > > > > > > > > The series goes like this: > > > > > > > > > > 1. introduce __init_skb and __init_skb_shinfo; those will > > > > > > > > > > be used to > > > > > > > > > > initialize temporary skb > > > > > > > > > > 2. introduce skb_net which can be used to get networking > > > > > > > > > > namespace > > > > > > > > > > associated with an skb > > > > > > > > > > 3. add new optional network namespace argument to > > > > > > > > > > __skb_flow_dissect and > > > > > > > > > > plumb through the callers > > > > > > > > > > 4. add new __flow_bpf_dissect which constructs temporary > > > > > > > > > > on-stack skb > > > > > > > > > > (using __init_skb) and calls BPF flow dissector program > > > > > > > > > > > > > > > > > > The main concern I see with this series is this cost of skb > > > > > > > > > zeroing > > > > > > > > > for every packet in the device driver receive routine, > > > > > > > > > *independent* > > > > > > > > > from the real skb allocation and zeroing which will likely > > > > > > > > > happen > > > > > > > > > later. > > > > > > > > Yes, plus ~200 bytes on the stack for the callers. > > > > > > > > > > > > > > > > Not sure how visible this zeroing though, I can probably try to > > > > > > > > get some > > > > > > > > numbers from BPF_PROG_TEST_RUN (running current version vs > > > > > > > > running with > > > > > > > > on-stack skb). > > > > > > > > > > > > > > imo extra 256 byte memset for every packet is non starter. > > > > > > We can put pre-allocated/initialized skbs without data into percpu > > > > > > or even > > > > > > use pcpu_freelist_pop/pcpu_freelist_push to make sure we don't have > > > > > > to think > > > > > > about having multiple percpu for irq/softirq/process contexts. > > > > > > Any concerns with that approach? > > > > > > Any other possible concerns with the overall series? > > > > > > > > > > I'm missing why the whole thing is needed. > > > > > You're saying: > > > > > " make BPF flow dissector programs be able to work in the skb-less > > > > > case". > > > > > What does it mean specifically? > > > > > The only non-skb case is XDP. > > > > > Are you saying you want flow_dissector prog to be run in XDP? > > > > eth_get_headlen that drivers call on RX path on a chunk of data to > > > > guesstimate the length of the headers calls flow dissector without an > > > > skb > > > > (__skb_flow_dissect was a weird interface where it accepts skb or > > > > data+len). Right now, there is no way to trigger BPF flow dissector > > > > for this case (we don't have an skb to get associated > > > > namespace/etc/etc). > > > > The patch series tries to fix that to make sure that we always trigger > > > > BPF program if it's attached to a device's namespace. > > > > > > then why not to create flow_dissector prog type that works without skb? > > > Why do you need to fake an skb? > > > XDP progs work just fine without it. > > What's the advantage of having another prog type? In this case we would have > > to write the same flow dissector program twice: first time against > > __skb_buff > > interface, second time against xdp_md. > > By using fake skb, we make the same flow dissector __sk_buff BPF program > > work in both contexts without a rewrite to an xdp interface (I don't > > think users should care whether flow dissector was called form "xdp" vs skb > > context; and we're sort of stuck with __sk_buff interface already). > Should I follow up with v2 where I address memset(,,256) for each packet? > Or you still have some questions/doubts/suggestions regarding the problem > I'm trying to solve?
sorry for delay. I'm still thinking what is the path forward here. That 'stuck with __sk_buff' is what bothers me. It's an indication that api wasn't thought through if first thing it needs is this fake skb hack. If bpf_flow.c is a realistic example of such flow dissector prog it means that real skb fields are accessed. In particular skb->vlan_proto, skb->protocol. These fields in case of 'fake skb' will not be set, since eth_type_trans() isn't called yet. So either flow_dissector needs a real __sk_buff and all of its fields should be real or it's a different flow_dissector prog type that needs ctx->data, ctx->data_end, ctx->flow_keys only. Either way going with fake skb is incorrect, since bpf_flow.c example will be broken and for program writers it will be hard to figure why it's broken.