Jarkko Sakkinen <jar...@kernel.org> writes: Hi Jarkko,
Thanks for the comments. Paul did a very nice job providing some background info, allow me to provide some additional data. > On Fri, Mar 21, 2025 at 09:45:02AM -0700, Blaise Boscaccy wrote: >> This patch series introduces the Hornet LSM. >> >> Hornet takes a simple approach to light-skeleton-based eBPF signature > > Can you define "light-skeleton-based" before using the term. > > This is the first time in my life when I hear about it. > Sure. Here is the patchset where this stuff got introduced if you are curious. https://lore.kernel.org/bpf/20220209054315.73833-1-alexei.starovoi...@gmail.com/ eBPF has similar requirements to that of modules when it comes to loading: find kallysym addresses, fix up elf relocations, some struct field offset handing stuff called CO-RE (compile-one run-anywhere), and some other miscellaneous bookkeeping. During eBPF program compilation, pseudo-values get written to the immedate operands of instructions. During loading, those pseudo-values get rewritten with concrete addresses or data applicable to the currently running system, e.g. a kallsym address or a fd for a map. This needs to happen before the instructions for a bpf program are loaded into the kernel via the bpf() syscall. Unlike modules, an in-kernel loader unfortunately doesn't exist. Typically, the instruction rewriting is done dynamically in userspace via libbpf (or the rust/go/python loader). What skeletons do is generate a script of required instruction-rewriting operations which then gets played back at load-time against a hard-coded blob of raw instruction data. This removes the need to distribute source-code or object files. There are two flavors of skeletons, normal skeletons, and light skeletons. Normal skeletons utilize relocation logic that lives in libbpf, and the relocations/instruction rewriting happen in userspace. The second flavor, light skeletons, uses a small eBPF program that contains the relocation lookup logic. As it's running in in the kernel, it unpacks the target program, peforms the instruction rewriting, and loads the target program. Light skeletons are currently utilized for some drivers, and BPF_PRELOAD functionionality since they can operate without userspace. Light skeletons were recommended on various mailing list discussions as the preffered path to performing signature verification. There are some PoCs floating around that used light-skeletons in concert with fs-verity/IMA and eBPF LSMs. We took a slightly different approach to Hornet, by utilizing the existing PCKS#7 signing scheme that is used for kernel modules. >> verification. Signature data can be easily generated for the binary > > s/easily// > > Useless word having no measure. > Ack, thanks. >> data that is generated via bpftool gen -L. This signature can be > > I have no idea what that command does. > > "Signature data can be generated for the binary data as follows: > > bpftool gen -L > > <explanation>" > > Here you'd need to answer to couple of unknowns: > > 1. What is in exact terms "signature data"? That is a PKCS#7 signature of a data buffer containing the raw instructions of an eBPF program, followed by the initial values of any maps used by the program. > 2. What does "bpftool gen -L" do? > eBPF programs often have 2 parts. An orchestrator/loader program that provides load -> attach/run -> i/o -> teardown logic and the in-kernel program. That command is used to generate a skeleton which can be used by the orchestrator prgoram. Skeletons get generated as a C header file, that contains various autogenerated functions that open and load bpf programs as decribed above. That header file ends up being included in a userspace orchestrator program or possibly a kernel module. > This feedback maps to other examples too in the cover letter. > > BR, Jarkko I'll rework this with some definitions of the eBPF subsystem jargon along with your suggestions. -blaise