> I agree fixing offsets in cbpf to ebpf converter and passing mbuf is easier. > There is still the pathological case of multi segment mbuf. But > Linux XDP doesn't handle it either.
Why do you think multi-seg would be a problem? If we’ll use rte_pktmbuf_read() for interpreter and something similar for jit, multi-seg case should be covered, no? > > Let me put early version of filter2rteebf on GitHub > > On Thu, Feb 6, 2020, 8:54 AM Morten Brørup > <mailto:m...@smartsharesystems.com> wrote: > > From: dev [mailto:mailto:dev-boun...@dpdk.org] On Behalf Of Ananyev, > > Konstantin > > Sent: Wednesday, February 5, 2020 10:16 PM > > > > > > > > As I mentioned in my FOSDEM talk the current DPDK eBPF handling is > > > not usable for packet filters. I have ported the classic BPF to eBPF > > code > > > and the generated code is not usable by DPDK. > > > > > > The problem is that DPDK eBPF does not implement all the opcodes. > > > BPF_ABS is not implemented and must be. It is in the Linux kernel. > > > > Yep, it doesn't. > > This is not a generic eBPF instruction, but sort of implicit function > > call > > to access skb data. At initial stage of librte_bpf development, > > I didn't think much about cBPF conversion, and to simplify things > > decided to put all special skb features aside. > > But sure, if that will enable DPDK cBPF support, let's add it in. > > Please fill a Bugzilla ticket to track that issue, and I'll try to > > find some time within 20.05 to look at it. > > Unless of course, you or someone else would like to volunteer for it. > > > > Though at first step, we probably need to decide what should be > > our requirements for it in terms of DPDK. > > From https://www.kernel.org/doc/Documentation/networking/filter.txt: > > "eBPF has two non-generic instructions: (BPF_ABS | <size> | BPF_LD) and > > (BPF_IND | <size> | BPF_LD) which are used to access packet data. > > They had to be carried over from classic to have strong performance of > > socket filters running in eBPF interpreter. These instructions can only > > be used when interpreter context is a pointer to 'struct sk_buff' and > > have seven implicit operands. Register R6 is an implicit input that > > must > > contain pointer to sk_buff. Register R0 is an implicit output which > > contains > > the data fetched from the packet. Registers R1-R5 are scratch registers > > and must not be used to store the data across BPF_ABS | BPF_LD or > > BPF_IND | BPF_LD instructions. > > These instructions have implicit program exit condition as well. When > > eBPF program is trying to access the data beyond the packet boundary, > > the interpreter will abort the execution of the program. JIT compilers > > therefore must preserve this property. src_reg and imm32 fields are > > explicit inputs to these instructions. > > For example: > > BPF_IND | BPF_W | BPF_LD means: > > R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32)) > > and R1 - R5 were scratched." > > > > For RTE_BPF_ARG_PTR_MBUF context we probably want behavior similar > > to linux, i.e. BPF_IND | BPF_W | BPF_LD would mean something like: > > > > 1) uint32_t tmp; > > R0 = &tmp; > > R0 = rte_pktmbuf_read((const struct rte_mbuf *)R6, src_reg + > > imm32, > > sizeof(uint32_t), R0); > > if (R0 == NULL) return FAILED; > > R0 = ntohl(*(uint32_t *)R0); > > > > But what to do with when ctx is raw data buffer (RTE_BPF_ARG_PTR)? > > Should it be just: > > 2) R0 = ntohl(*(uint32_t *)(R6 + src_reg + imm32)); > > 3) not allow LD_ABS/LD_IND in this mode at all. > > > > I think that 1) is the correct choice for the "cBPF filter" use case. > > In that context I consider both 2) and 3) irrelevant because the > RTE_BPF_ARG_PTR_MBUF type should be used for cBPF filtering. I have > argued for this before. > > Others have argued for using the RTE_BPF_ARG_PTR type instead. Let's consider > using the RTE_BPF_ARG_PTR for a moment... Is there an > implicit understanding that the data points to packet data? Then a range > check in 2) might be relevant. > > However, if the RTE_BPF_ARG_PTR_MBUF type is supported and used for > cBPF->eBPF conversion, we would not need to support > LD_ABS/LD_IND for the RTE_BPF_ARG_PTR type. So between 2) and 3), I support > 3). > > > Second question is implementation. > > I can see two main options here: > > a) if we plan to have our own cBPF->eBPF converter and support only it, > > we can add these extra instructions generation into converter itself. > > I.E. cBPF->eBPF conversion for LD_ABS/LD_IND will generate series > > of generic eBPF instructions. > > b) support eBPF LD_ABS/LD_IND in eBPF interpreter/jit > > > > (a) probably a simpler way (eBPF interpreter/jit/verifier would remain > > unchanged), > > but seems way too limited. So I think (b) is a better choice, even more > > work implied > > (interpreter seems more or less straightforward, jit would probably > > need some effort). > > > > This is going to be used in the fast path, probably on all packets on an > interface. So clearly b). > > > Any thoughts/opinions? > > Konstantin > > One more piece of information: Linux cBPF supports Auxiliary data (VLAN ID, > Interface Index, etc.), i.e. metadata that are not part of the > packet data, but can be found in the sk_buff/mbuf. Going for 1) and b) might > make it easier adding support for these later. > > > Med venlig hilsen / kind regards > - Morten Brørup