On Mon, 17 Jun 2024 23:01:12 +0200
Alexander Graf <g...@amazon.com> wrote:
> > This could be an added feature, but it is very architecture specific,
> > and would likely need architecture specific updates.  
> 
> 
> It definitely would be an added feature, yes. But one that allows you to 
> ensure persistence a lot more safely :).

Sure.

> 
> Thinking about it again: What if you run the allocation super early (see 
> arch/x86/boot/compressed/kaslr.c:handle_mem_options())? If you stick to 
> allocating only from top, you're effectively kernel version independent 
> for your allocations because none of the kernel code ran yet and 
> definitely KASLR independent because you're running deterministically 
> before KASLR even gets allocated.
> 
> > As this code relies on memblock_phys_alloc() being consistent, if
> > something gets allocated before it differently depending on where the
> > kernel is, it can also move the location. A plugin to UEFI would mean
> > that it would need to reserve the memory, and the code here will need
> > to know where it is. We could always make the function reserve_mem()
> > global and weak so that architectures can override it.  
> 
> 
> Yes, the in-kernel UEFI loader (efi-stub) could simply populate a new 
> type of memblock with the respective reservations and you later call 
> memblock_find_in_range_node() instead of memblock_phys_alloc() to pass 
> in flags that you want to allocate only from the new 
> MEMBLOCK_RESERVE_MEM type. The same model would work for BIOS boots 
> through the handle_mem_options() path above. In fact, if the BIOS way 
> works fine, we don't even need UEFI variables: The same way allocations 
> will be identical during BIOS execution, they should stay identical 
> across UEFI launches.
> 
> As cherry on top, kexec also works seamlessly with the special memblock 
> approach because kexec (at least on x86) hands memblocks as is to the 
> next kernel. So the new kernel will also automatically use the same 
> ranges for its allocations.

I'm all for expanding this. But I would just want to get this in for
now as is. It theoretically works on all architectures. If someone
wants to make in more robust and accurate on a specific architecture,
I'm all for it. Like I said, we could make the reserver_mem() function
global and weak, and then if an architecture has a better way to handle
this, it could use that.

Hmm, x86 could do this with the e820 code like I did in my first
versions. Like I said, it didn't fail at all with that.

And we can have an UEFI version as well.

-- Steve

Reply via email to