On 03/04/2018 08:40 PM, Florian Westphal wrote: > These patches, which go on top of the 'bpfilter' RFC patches, > demonstrate an nftables to ebpf translation (done in userspace). > In order to not duplicate the ebpf code generation efforts, the rules > > iptables -i lo -d 127.0.0.2 -j DROP > and > nft add rule ip filter input ip daddr 127.0.0.2 drop > > are first translated to a common intermediate representation, and then > to ebpf, which attaches resulting prog to the XDP hook. > > IMR representation is identical in both cases so therefore both > rules result in the same ebpf program. > > The IMR currently assumes that translation will always be to ebpf. > As per previous discussion it doesn't consider other targets, so > for instance IMR pseudo-registers map 1:1 to ebpf ones. > > The IMR is also supposed to be generic enough to make it easy to convert > 'fronted' formats (iptables rule blob, nftables netlink) to it, and > also extend it to cover ip rule, ovs or any other inputs in the future > without need for major changes to the IMR. > > The IMR currently implements following basic operations: > - Relational (equal, not equal) > - immediates (32 and 64bit constants) > - payload with relative addressing (macr, network, transport header) > - verdict (pass, drop, next rule) > > Its still in early stage, but I think its good enough as > a proof-of-concept.
Thanks a lot for working on this! Overall I like the PoC and the underlying idea of it! I think the design of such IMR would indeed be the critical part in that it needs to be flexible enough to cover both front ends well enough without having to make compromises to one of them. The same would be for optimization passes e.g. when we know that two successive rules would match on TCP header bits that we can reuse the register that loaded/parsed it previously to that point. Similar when it comes to maps when the lookup value would need to propagate through the linked imr objects. Do you see such optimizations or in general propagation of state as direct part of the IMR or rather somewhat hidden in IMR layer when doing the IMR to BPF 'jit' phase? Which other parts do you think would be needed for the IMR aside from above basic operations? ALU ops, JMPs for the relational ops? I think it would be good to have clear semantics in terms of what it would eventually abstract away from raw BPF when sitting on top of it; potentially these could be details on packet access or interaction with helpers or other BPF features such that it's BPF prog type independent at this stage, e.g. another point could be that given the priv/cap level of the uapi request, there could also be different BPF code gen backends that implement against the IMR, e.g. when the request comes out of userns then it has feature constraints in terms of e.g. having to use bpf_skb_{load,store}_bytes() helpers for packet access instead of direct packet access or not being able to use BPF to BPF calls, etc; wdyt? > Known differences between nftjit.ko and bpfilter.ko: > nftjit.ko currently doesn't run transparently, but thats only > because I wanted to focus on the IMR and get the POC out of the door. > > It should be possible to get it transparent via the bpfilter.ko approach. > > Next steps for the IMR could be addition of binary operations for > prefixes ("-d 192.168.0.1/24"), its also needed e.g. for tcp flag > matching (-p tcp --syn in iptables) and so on. > > I'd also be interested in wheter XDP is seen as appropriate > target hook. AFAICS the XDP and the nftables ingress hook are similar > enough to consider just (re)using the XDP hook to jit the nftables ingress > hook. The translator could check if the hook is unused, and return > early if some other program is already attached. > > Comments welcome, especially wrt. IMR concept and what might be > next step(s) in moving forward. > > The patches are also available via git at > https://git.breakpoint.cc/cgit/fw/net-next.git/log/?h=bpfilter7 . Thanks, Daniel