Yuri Benditovich <yuri.benditov...@daynix.com> writes: > On Fri, Dec 4, 2020 at 12:09 PM Toke Høiland-Jørgensen <t...@redhat.com> > wrote: > >> Yuri Benditovich <yuri.benditov...@daynix.com> writes: >> >> > On Wed, Dec 2, 2020 at 4:18 PM Toke Høiland-Jørgensen <t...@redhat.com> >> > wrote: >> > >> >> Jason Wang <jasow...@redhat.com> writes: >> >> >> >> > On 2020/11/19 下午7:13, Andrew Melnychenko wrote: >> >> >> This set of patches introduces the usage of eBPF for packet steering >> >> >> and RSS hash calculation: >> >> >> * RSS(Receive Side Scaling) is used to distribute network packets to >> >> >> guest virtqueues by calculating packet hash >> >> >> * Additionally adding support for the usage of RSS with vhost >> >> >> >> >> >> The eBPF works on kernels 5.8+ >> >> >> On earlier kerneld it fails to load and the RSS feature is reported >> >> >> only without vhost and implemented in 'in-qemu' software. >> >> >> >> >> >> Implementation notes: >> >> >> Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program. >> >> >> Added libbpf dependency and eBPF support. >> >> >> The eBPF program is part of the qemu and presented as an array >> >> >> of BPF ELF file data. >> >> >> The compilation of eBPF is not part of QEMU build and can be done >> >> >> using provided Makefile.ebpf(need to adjust 'linuxhdrs'). >> >> >> Added changes to virtio-net and vhost, primary eBPF RSS is used. >> >> >> 'in-qemu' RSS used in the case of hash population and as a fallback >> >> option. >> >> >> For vhost, the hash population feature is not reported to the guest. >> >> >> >> >> >> Please also see the documentation in PATCH 5/5. >> >> >> >> >> >> I am sending those patches as RFC to initiate the discussions and get >> >> >> feedback on the following points: >> >> >> * Fallback when eBPF is not supported by the kernel >> >> >> * Live migration to the kernel that doesn't have eBPF support >> >> >> * Integration with current QEMU build >> >> >> * Additional usage for eBPF for packet filtering >> >> >> >> >> >> Known issues: >> >> >> * hash population not supported by eBPF RSS: 'in-qemu' RSS used >> >> >> as a fallback, also, hash population feature is not reported to >> guests >> >> >> with vhost. >> >> >> * big-endian BPF support: for now, eBPF isn't supported on >> >> >> big-endian systems. Can be added in future if required. >> >> >> * huge .h file with eBPF binary. The size of .h file containing >> >> >> eBPF binary is currently ~5K lines, because the binary is built with >> >> debug information. >> >> >> The binary without debug/BTF info can't be loaded by libbpf. >> >> >> We're looking for possibilities to reduce the size of the .h files. >> >> > >> >> > >> >> > Adding Toke for sharing more idea from eBPF side. >> >> > >> >> > We had some discussion on the eBPF issues: >> >> > >> >> > 1) Whether or not to use libbpf. Toke strongly suggest to use libbpf >> >> > 2) Whether or not to use BTF. Toke confirmed that if we don't access >> any >> >> > skb metadata, BTF is not strictly required for CO-RE. But it might >> still >> >> > useful for e.g debugging. >> >> > 3) About the huge (5K lines, see patch #2 Toke). Toke confirmed that >> we >> >> > can strip debug symbols, but Yuri found some sections can't be >> stripped, >> >> > we can keep discussing here. >> >> >> >> I just tried simply running 'strip' on a sample trivial XDP program, >> >> which brought its size down from ~5k to ~1k and preserved the BTF >> >> information without me having to do anything. >> >> >> > >> > With our eBPF code the numbers are slightly different: >> > The code size without BTF: 7.5K (built without '-g') >> > Built with '-g': 45K >> > Stripped: 19K >> > The difference between 7.5 and 19K still seems significant, especially >> when >> > we do not use any kernel structures and do not need these BTF sections >> >> That does seem like a lot of BTF information. Did you confirm (with >> objdump) that it's the .BTF* sections that take up these extra 12k? Do >> you have some really complicated data structures in the file or >> something? Got a link to the source somewhere that isn't a web mailing >> list archive? :) >> >> > Looks like the extra size is related to BTF: there are 4 BTF sections that > take 12.5K > [ 7] .BTF PROGBITS 0000000000000000 00144c 00175d 00 > 0 0 1 > [ 8] .rel.BTF REL 0000000000000000 002bb0 000040 10 > 14 7 8 > [ 9] .BTF.ext PROGBITS 0000000000000000 002bf0 000cd0 00 > 0 0 1 > [10] .rel.BTF.ext REL 0000000000000000 0038c0 000ca0 10 > 14 9 8
Right, okay, that does not look completely outrageous with the amount of code and type information you have in that file. > All the sources are at: > The branch without libbpf > https://github.com/daynix/qemu/tree/eBPF_RFC > The branch with libbpf > https://github.com/daynix/qemu/tree/eBPF_RFCv2 > > all the eBPF-related code is under qemu/ebpf directory. Ah, cool, thanks! >> In any case, while I do think it smells a little of premature >> optimisation, you can of course strip the BTF information until you need >> it. Having it around makes debugging easier (bpftool will expand your >> map structures for you when dumping maps, and that sort of thing), but >> it's not really essential if you don't need CO-RE. >> >> > This is only reason to prefer non-libbpf option for this specific eBPF >> >> You can still use libbpf without BTF. It's using BTF without libbpf that >> tends to not work so well... >> >> > If we build the eBPF without '-g' or strip the BTF information out of the > object file the libbpf crashes right after issuing printout "libbpf: BTF is > required, but is missing or corrupted". > We did not investigate this too deeply but on the first glance it looks > like the presence of maps automatically makes the libbpf to require BTF. Ah, right. Well, you're using the BTF-based map definition syntax. So yeah, that does require BTF: The __uint() and __type() macros really expand to type definitions that are specifically crafted to be embedded as BTF in the file. You could use the old-style map definitions that don't use BTF[0], but BTF is really where things are going in BPF-land so I think longer term you'll probably end up needing it anyway. So going to this much trouble just to save 10k on binary size seems to me like it's a decision you'll end up regretting :) [0] https://github.com/xdp-project/xdp-tutorial/blob/master/basic03-map-counter/xdp_prog_kern.c#L11 -Toke