Hi, so I want to bring this discussion here that I had mostly with myself in the past days on IRC.
As some of you know, we had a couple issues with large initrds in Ubuntu, Jeremy posted a patch series earlier about mmunlimited. I wanted to propose a more fine-grained approach, as well as a more generic approach to handling large allocations. The first issue one experiences when opening large initrds is that grub_file_open() calls grub_verifier_open() which simply grub_malloc()s a buffer for the size of the file. Later, for initrd, we have to allocate it a second time, in the upstream tree that happens via relocator, in the rhboot tree it allocates directly from EFI. Now my basic proposal is quite simple: We make grub_malloc() and that relocator allocation code bypass the grub memory management altogether and just do raw EFI page allocations (provide two function pointers grub_mm_allocate_pages and grub_mm_free_pages, and just call them if allocation size is large[1]). e.g. at the start of grub_malloc: if (len > @100 pages@ && grub_mm_allocate_pages != NULL) { ret = grub_mm_allocate_pages_below(@4GB@, ..., ROUND_TO_PAGES(size)); if (ret == NULL) ret = grub_mm_allocate_pages_below(@infinity@, ..., ROUND_TO_PAGES(size)); return ret; } Allocating those below 4GB and only falling back to >4GB when we run out of space allows us to avoid most issues where DMA fails above 4GB. But then we also patch grub_file_read() to check if the target buffer is located above 4GB and if so, use bounce buffers to copy data so that we avoid even more of those issues, so we add to the start of it something like: if ((grub_addr_t) buf > @4GB@) { return read_bufferedfile, buf. len) } where grub_file_read_with_buffer is like in rhboot's EFI loader: #define BOUNCE_BUFFER_MAX 0x1000000ull static grub_ssize_t read_buffered(grub_file_t file, grub_uint8_t *bufp, grub_size_t len) { grub_ssize_t bufpos = 0; static grub_size_t bbufsz = 0; static char *bbuf = NULL; if (bbufsz == 0) bbufsz = MIN(BOUNCE_BUFFER_MAX, len); while (!bbuf && bbufsz) { bbuf = grub_malloc(bbufsz); if (!bbuf) bbufsz >>= 1; } if (!bbuf) grub_error (GRUB_ERR_OUT_OF_MEMORY, N_("cannot allocate bounce buffer")); while (bufpos < (long long)len) { grub_ssize_t sz; sz = grub_file_read (file, bbuf, MIN(bbufsz, len - bufpos)); if (sz < 0) return sz; if (sz == 0) break; grub_memcpy(bufp + bufpos, bbuf, sz); bufpos += sz; } return bufpos; } Now we still end up allocating each file twice, but we allocate and release the verifier copy to the EFI system. This means that we allocate a lot less regions and have outsourced the problem of releasing the memory after it's been used to the firmware :) Of course ultimately we would want to avoid the double allocation altogether, so it might make sense to provide a way to directly allocate the buffer we need, such as: void * (*grub_allocator)(size_t bytes); grub_file_t grub_file_open_alloc(const char *name, enum grub_file_type type, grub_allocator allocator); or a function that simply reads a file at a path into a buffer: void *grub_file_open_read_close(const char *name, enum grub_file_type type, grub_allocator allocator); The latter simply allocates the buffer by calling allocator, reads into it, then verifies the content using verifier framework before returning it. So if we want to load an initrd, we write a function that allocates an initrd using whatever policies the kernel needs there, and then do initrd_buf = grub_file_open_read_close(path, GRUB_FILE_TYPE_LINUX_INITRD | GRUB_FILE_TYPE_NO_DECOMPRESS, initrd_alloc); and then we're done and don't need to allocate and read each file twice. But that seems like a 2nd step that's a bit more complex than bypassing the MM for large allocations and using bounce buffers for >4GB targets in grub_file_read(). [1] What is large? Perhaps it's just 100 pages, perhaps it's 4 MB. It depends on how different the performance is for the EFI call round trip vs doing it in our mm. -- debian developer - deb.li/jak | jak-linux.org - free software dev ubuntu core developer i speak de, en
signature.asc
Description: PGP signature
_______________________________________________ Grub-devel mailing list Grub-devel@gnu.org https://lists.gnu.org/mailman/listinfo/grub-devel