Daniel Kiper writes:
> On Thu, Sep 02, 2021 at 12:48:24AM +1000, Daniel Axtens wrote:
>> Patrick Steinhardt writes:
>>
>> > Currently, all platforms will set up their heap on initialization of the
>> > platform code. While this works mostly fine, it poses some limitations
>> > on memory management on us. Most notably, allocating big chunks of
>> > memory in the gigabyte range would require us to pre-request this many
>> > bytes from the firmware and add it to the heap from the beginning on
>> > some platforms like EFI. As this isn't needed for most configurations,
>> > it is inefficient and may even negatively impact some usecases when,
>> > e.g., chainloading. Nonetheless, allocating big chunks of memory is
>> > required sometimes, where one example is the upcoming support for the
>> > Argon2 key derival function in LUKS2.
>> >
>> > In order to avoid pre-allocating big chunks of memory, this commit
>> > implements a runtime mechanism to add more pages to the system. When a
>> > given allocation cannot be currently satisfied, we'll call a given
>> > callback set up by the platform's own memory management subsystem,
>> > asking it to add a memory area with at least `n` bytes. If this
>> > succeeds, we retry searching for a valid memory region, which should now
>> > succeed.
>> >
>>
>> I implemented this for ieee1275-powerpc. I set the initial memory claim
>> to 1MB to match EFI and to exercise the code.
>>
>> Thoughts as I progressed:
>>
>> - You probably need to think about how to satisfy requests with
>>particular alignments: currently there is no way to specify that with
>>the current interface, and I saw powerpc-ieee1275 return bunch of
>>allocations at e.g 0x2a561e which is not particularly well aligned!
>
> I think at "firmware memory allocation" level we could always allocate
> page aligned regions. Of course this may not satisfy allocations
> aligned at larger values than a page. Though I think we can solve this
> by allocating larger regions from firmware. Please look below for more
> details...
>
>> - You haven't included in the calculations the extra space required for
>>mm housekeeping. For example, I'm seeing an allocation for 32kB be
>>requested via grub_mm_add_region_fn. I dutifully allocate 32kB, with
>>no alignment requirements, and pass that pointer to
>>grub_mm_init_region. grub_mm_init_region throws away bytes at the
>>start to get to GRUB_MM_ALIGN, then uses some bytes for the
>>grub_mm_header_t, then any actual allocation uses bytes for the
>>malloc metadata. So the actual underlying allocation cannot be
>>satisfied.
>>
>>I think you get away with this on EFI because you use BYTES_TO_PAGES
>>and get page-aligned memory, but I think you should probably round up
>>to the next power of 2 for smaller allocations or to the next page or
>>so for larger allocations.
>
> I think we could allocate at least e.g. 128 MiB from firmware if there is
> not enough memory available in the GRUB mm. This way we would avoid frequent
> calls to firmware and could satisfy requests for larger alignments.
>
>> - After fixing that in the ieee1275 code, all_functional_test
>>hangs trying to run the cmdline_cat test. I think this is from a slow
>>algorithm somewhere - the grub allocator isn't exactly optimised for
>>a proliferation of regions.
>
> Could you try the solution proposed above? Maybe it will solve problem of
> frequent additions of memory to the GRUB mm.
>
>> - I noticed that nearly all the allocations were under 1MB. This seems
>>inefficient for a trip out to firmware. So I made the ieee1275 code
>>allocate at least max(4MB, (size of allocation rounded up nearest
>>1MB) + 4kB). This makes the tests run with only the usual failures,
>>at least on pseries with debug on... still chasing some bugs beyond
>>that.
>
> Yeah, this is similar to what I proposed above. Though I would want to see
> larger numbers tested as I said earlier.
>
>> - The speed impact depends on the allocation size. I'll post something
>>on that tomorrow, hopefully, but larger minimum allocations work
>>noticably better.
>>
>> - We only have 4GB max to play with because (at least) powerpc-ieee1275
>>is technically defined to be 32 bit. So I'm a bit nervous about
>>further large allocations unless we have a way to release them back
>>to _firmware_, not just to grub.
>
> Ugh... This can be difficult. I am not sure the GRUB mm is smart enough
> to release memory regions if they are not used anymore by it.
I didn't explain this well. What I'm trying to say is this:
Say you require 1GB of memory temporarily.
You do this:
{
void *mem = grub_large_malloc(1024 * 1024 * 1024);
operate_upon(mem);
grub_large_free(mem);
}
large_malloc and large_free go directly to firmware. We bypass the
existing grub mm infrastructure completely. This way, people who need
this sort of very large allocation do it in a way