On Mon, 17 Jun 2024 23:01:12 +0200 Alexander Graf <g...@amazon.com> wrote: > > This could be an added feature, but it is very architecture specific, > > and would likely need architecture specific updates. > > > It definitely would be an added feature, yes. But one that allows you to > ensure persistence a lot more safely :).
Sure. > > Thinking about it again: What if you run the allocation super early (see > arch/x86/boot/compressed/kaslr.c:handle_mem_options())? If you stick to > allocating only from top, you're effectively kernel version independent > for your allocations because none of the kernel code ran yet and > definitely KASLR independent because you're running deterministically > before KASLR even gets allocated. > > > As this code relies on memblock_phys_alloc() being consistent, if > > something gets allocated before it differently depending on where the > > kernel is, it can also move the location. A plugin to UEFI would mean > > that it would need to reserve the memory, and the code here will need > > to know where it is. We could always make the function reserve_mem() > > global and weak so that architectures can override it. > > > Yes, the in-kernel UEFI loader (efi-stub) could simply populate a new > type of memblock with the respective reservations and you later call > memblock_find_in_range_node() instead of memblock_phys_alloc() to pass > in flags that you want to allocate only from the new > MEMBLOCK_RESERVE_MEM type. The same model would work for BIOS boots > through the handle_mem_options() path above. In fact, if the BIOS way > works fine, we don't even need UEFI variables: The same way allocations > will be identical during BIOS execution, they should stay identical > across UEFI launches. > > As cherry on top, kexec also works seamlessly with the special memblock > approach because kexec (at least on x86) hands memblocks as is to the > next kernel. So the new kernel will also automatically use the same > ranges for its allocations. I'm all for expanding this. But I would just want to get this in for now as is. It theoretically works on all architectures. If someone wants to make in more robust and accurate on a specific architecture, I'm all for it. Like I said, we could make the reserver_mem() function global and weak, and then if an architecture has a better way to handle this, it could use that. Hmm, x86 could do this with the e820 code like I did in my first versions. Like I said, it didn't fail at all with that. And we can have an UEFI version as well. -- Steve