First of all, and just to join the crowd, kernel/bpf/ FTW. Now, I have some suggestions about eBPF. IMO classic BPF is an ISA oriented to filter (meaning returning a single integer that states how many bytes of the packet must be captured) packets (e.g. consider the 6 load modes, where 3 provide access the packet -- abs, ind, msh --, one to an skb field -- len--, the 5th one to the memory itself -- mem --, and the 6th is an immediate set mode --imm-- ) that has been used in other environments (seccomp, tracing, etc.) by (a) extending the idea of a "packet" into a "buffer", and (b) adding ancillary loads.
eBPF should be a generic ISA that can be used by many environments, including those served today by classic BPF. IMO, we should get a nicely-defined ISA (MIPS anyone?) and check what should go into eBPF and what should not. - 1. we should considering separating the eBPF ISA farther from classic BPF - eBPF still uses a_reg and x_reg as the names of the 2 op registers. This is very confusing, especially when dealing with translated filters that do move data between A and X. I've had a_reg being X, and x_reg being A. We should rename them d_reg and s_reg. - BPF_LD vs. BPF_LDX: this made sense in classic BPF, as there was only one register, and d_reg was implicit in the name of the insn code. Now, why are we keeping both in eBPF, when the register we're writing to is made explicit in d_reg (I already forgot if d_reg was a_reg or x_reg ;) ? Removing one of them will save us 1/8th of the insns. - BPF_ST vs. BPF_STX: same here. Note that the current sk_convert_filter() just converts all stores to BPF_STX. - 2. there are other insn that we should consider adding: - lui: AFAICT, there is no clean way to build a 64-bit number (you can LD_IMM the upper part, lsh 32, and then add the lower part). - nop: I'd like to have a nop. Do I know why? Nope. On Tue, Jun 3, 2014 at 1:58 PM, Alexei Starovoitov <a...@plumgrid.com> wrote: > On Tue, Jun 3, 2014 at 1:35 PM, Daniel Borkmann <dbork...@redhat.com> wrote: >> On 06/03/2014 05:44 PM, Alexei Starovoitov wrote: >> ... >>> >>> All of your points are valid. They are right questions to ask. I just >>> >>> don't see why you're still arguing about first step of filter.c split, >>> whereas your concerns are about steps 2, 3, 4. >> >> >> Fair enough, lets keep them in mind though for future work. Btw, > > Ok :) > >> are other files planned for kernel/bpf/ or should it instead just >> simply be kernel/bpf.c? > > The most obvious one is eBPF verifier in separate file (kernel/bpf/verifier.c) > bpf maps is yet another thing, but that's different topic. > Probably a set of bpf-callable functions in another file. Like right now > for sockets these helpers are __skb_get_pay_offset(), __skb_get_nlattr() > For tracing there will be a different set of helper functions and eventually > some will be common. Like __get_raw_cpu_id() from filter.c could > eventually move to kernel/bpf/helpers.c LGTM. I like the idea of every user (packet filter, seccomp, etc.) providing a map of the bpf calls that are ok, as in the packet filter stating that {1->__skb_get_pay_offset(), 2->__skb_get_nlattr(), ...}, but seccomp providing a completely different (or even empty) map. -Chema > I'm not a fan of squeezing different logic into one file. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/