This is a sketch of much more detailed descriptions of how grub mm works. I expect we'll want to bikeshed it a bit so I will roll up change requests into subsequent series I send that touch mm.
Signed-off-by: Daniel Axtens <d...@axtens.net> --- docs/grub-dev.texi | 405 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 399 insertions(+), 6 deletions(-) diff --git a/docs/grub-dev.texi b/docs/grub-dev.texi index fb2cc965ed80..2b3b21afe020 100644 --- a/docs/grub-dev.texi +++ b/docs/grub-dev.texi @@ -80,8 +80,7 @@ This edition documents version @value{VERSION}. * Updating External Code:: * Porting:: * Error Handling:: -* Stack and heap size:: -* BIOS port memory map:: +* Memory Management:: * Video Subsystem:: * PFF2 Font File Format:: * Graphical Menu Software Design:: @@ -1025,8 +1024,403 @@ if (grub_errno != GRUB_ERR_NONE) grub_error_pop (); @end example -@node Stack and heap size -@chapter Stack and heap size +@node Memory Management +@chapter Memory Management + +Grub supports both stack and heap memory. + +@section Heap management overview + +Grub memory management boils down to cells, blocks and regions. + +The most fundamental concept in grub mm data structures is that of the +`cell'. The cell is the atomic unit of memory in grub's memory management. A +cell is @code{GRUB_MM_ALIGN} bytes in size, which is 16 bytes on a 32-bit +platform and 32 bytes on a 64-bit platform. + +All metadata structures are exactly one cell in size and are cell-aligned. + +A block represents a block of memory that is either: +@itemize + +@item an allocation: memory that has been allocated by another part of + grub and is being used, or + +@item free: memory from which an allocation can be made. + +@end itemize + +All blocks are cell-aligned and all blocks are a whole number of cells in size. + +A region is an area of memory from firmware which contains one or more blocks. + +For more details, see the comment at the top of @code{kern/mm.c}. + +@subsection Where does heap memory come from? + +Heap memory comes from platform-specific code: the platform's memory map must be +consulted to find free memory regions. Depending on the platform, these regions +may need to be `claimed' in some way to protect them from being used by +firmware. Firmware may provide grub with a range of differently sized and +dis-contiguous regions. + +A region is defined by a @code{grub_mm_region_t} structure. This metadata +structure is stored in the first cell of each region. + +Region metadata consists of a pointer to a free block to speed up allocations +(@code{first}), a pointer to the next region (@code{next}) if one exists, the +number of bytes lost at the beginning to retain alignment (@code{pre_size}) and +the total size of the allocatable space in bytes (@code{size}). + +Regions are tracked as a linked list, with the head being @code{grub_mm_base}. + +Regions are created by platform-specific code. Once platform-specific code has +performed any necessary initialisation, it calls @code{grub_mm_init_region ()} +on the memory to make it available as a new region. Then it can used by the grub +memory management code to service allocation requests. + +If firmware passes a non-cell-aligned address to @code{grub_mm_init_region}, +then the address will be rounded up to cell-alignment, and the number of +`wasted' bytes stored in @code{pre_size}. Likewise, if the size after aligning +is not a multiple of the cell size, those bytes beyond the end of the last cell +cannot be used. (These bytes are not recorded.) + +When a region is added, grub will prefer to extend an existing region, if it can +do so. This allows for larger individual allocations to be satisfied. Currently +the code only supports merging when the new allocation is directly below an +existing region, that is, @code{new address + new size + previous region +pre_size = previous region base}. + +If a region cannot be merged into an existing region, a new region is created +and marked as containing a single free block. Regions are added in sorted order, +smallest first, but the region merging process does not respect the sorted order +and so the linked list may not remain in sorted order. + +@subsection Blocks: Free blocks and allocations + +Metadata for a block (either an allocation or a free block) is stored in a +@code{grub_mm_header_t} structure. + +The type of a block is set by the @code{magic} value. Its @code{size} is +measured in cells and includes the header cell. If it is a free block, the +@code{next} pointer points to the next free block in the circular linked +list. If the block is allocated, the @code{next} pointer is undefined. + +Metadata for a block is always stored in the cell immediately preceding the +block data. + +A region starts as one big free block. + +@subsection Example + +Say firmware provides you with one region running from 0xffff0 (1MB - 16 bytes) +to 0x10ffff0 (17MB - 16 bytes). Say you're on a 64-bit platform, so cell size is +32 bytes (0x20). The resultant layout is: + +@example +0x00ffff0 +- Start of region as seen by firmware + | ("pre-space", considered in region merging) +0x0100000 +- grub_mm_region_t + | first free block: 0x100020 + | next region: NULL + | pre_size (wasted bytes to ensure alignment): 0x10 + | size: 0xffffa0 (bytes, rounded to cells, not including our cell) + | (The last 16 bytes are wasted as there is not enough + | to form a cell. These are not recorded.) + | +0x0100020 +- grub_mm_header_t + | next free block: 0x100020 (circular linked list) + | size: 0x7ffff cells (region size in cells) + | magic: GRUB_MM_FREE_MAGIC + | <padding> +0x0100040 | -- + | free space belonging to the block + | +0x10fffe0 +- End of region as seen by grub + | (lost space) +0x10ffff0 +- End of region as seen by firmware +@end example + +Now say you want to allocate 1MB of memory with 1MB alignment. You would call +something like @code{grub_memalign (1024 * 1024, 1024 * 1024)}. This would then +call @code{grub_real_malloc ()} with our region. That function is well +documented: refer to the comments there for precisely how it operates. + +In this case, @code{grub_real_malloc ()} will satisfy the allocation, splitting +the free block into 3 and allocating as high an address as possible: + +@example +0x0100000 +- grub_mm_region_t + | first free block: 0x0100020 + | next region: NULL + | pre_size (wasted bytes to ensure alignment): 0x10 + | size: 0xffffa0 (bytes, rounded to cells, not including our cell) + | +0x0100020 +- grub_mm_header_t + | next free block: 0x1000000 + | size: 0x6ffff cells + | magic: GRUB_MM_FREE_MAGIC + | <padding> +0x0100040 | -- + | free space belonging to the block + | +0x0efffe0 +- grub_mm_header_t + | next free block: undefined + | size: 0x8001 cells (1MB + header cell) + | magic: GRUB_MM_ALLOC_MAGIC + | <padding> +0x0f00000 | -- + | allocated space + | +0x1000000 +- grub_mm_header_t + | next free block: 0x0100020 + | size: 0x7fff cells + | magic: GRUB_MM_FREE_MAGIC + | <padding> +0x1000020 | -- + | free space belonging to block + | +0x10fffe0 +- End of region as seen by grub +@end example + +In this case the free pointer in the region will not be updated. In some cases +it is updated on allocations: for example when the region it points to is no +longer free, or for small allocations. + +Now we make a small allocation of a few bytes. This will be rounded up to two +cells - one for the data and one for the header. @code{grub_real_malloc ()} will +iterate over the linked list of free block using a pair of pointers representing +the previous and current free block, @code{prev} and @code{cur} +respectively. @code{prev} starts at the region's @code{*first} pointer, and +@code{cur} starts as @code{*first->next}. + +This will be the result: + +@example +0x0100000 +- grub_mm_region_t + | first free block: 0x0100020 + | next region: NULL + | pre_size (wasted bytes to ensure alignment): 0x10 + | size: 0xffffa0 (bytes) + | +0x0100020 +- grub_mm_header_t + | next free block: 0x1000000 + | size: 0x6ffff cells + | magic: GRUB_MM_FREE_MAGIC +0x0100040 | -- + | free space belonging to the block + | +0x0efffe0 +- grub_mm_header_t + | size: 0x8001 cells (1MB + header cell) + | magic: GRUB_MM_ALLOC_MAGIC +0x0f00000 | -- + | allocated space + | +0x1000000 +- grub_mm_header_t + | next free block: 0x0100020 + | size: 0x7ffd cells + | magic: GRUB_MM_FREE_MAGIC +0x1000020 | -- + | free space belonging to block + | +0x10fffa0 +- grub_mm_header_t + | size: 0x2 cells (32 bytes + header cell) + | magic: GRUB_MM_ALLOC_MAGIC +0x10fffc0 | -- + | allocated space + | +0x10fffe0 +- End of region as seen by grub +@end example + +In this case, we do hit the code block at the end of @code{grub_real_malloc} +that would move @code{first}, but because we never advanced @code{prev}, +@code{*first} remains unchanged. + +So a subsequent allocation will again be made from the second free block if +possible - for example if we were to allocate 64kB, the result would be the +following: + +@example +0x0100000 +- grub_mm_region_t + | first free block: 0x0100020 + | next region: NULL + | pre_size (wasted bytes to ensure alignment): 0x10 + | size: 0xffffa0 (bytes) + | +0x0100020 +- grub_mm_header_t + | next free block: 0x1000000 + | size: 0x6ffff cells + | magic: GRUB_MM_FREE_MAGIC +0x0100040 | -- + | free space belonging to the block + | +0x0efffe0 +- grub_mm_header_t + | size: 0x8001 cells (1MB + header cell) + | magic: GRUB_MM_ALLOC_MAGIC +0x0f00000 | -- + | allocated space + | +0x1000000 +- grub_mm_header_t + | next free block: 0x0100020 + | size: 0x77fc cells + | magic: GRUB_MM_FREE_MAGIC +0x1000020 | -- + | free space belonging to block + | +0x10eff80 +- grub_mm_header_t + | size: 0x801 cells (64kB + header cell) + | magic: GRUB_MM_ALLOC_MAGIC +0x10effa0 | -- + | allocated space + | +0x10fffa0 +- grub_mm_header_t + | size: 0x2 cells (32 bytes + header cell) + | magic: GRUB_MM_ALLOC_MAGIC +0x10fffc0 | -- + | allocated space + | +0x10fffe0 +- End of region as seen by grub +@end example + +@subsection Freeing an allocation + +To free an allocation, the block's magic is flipped to +@code{GRUB_MM_FREE_MAGIC}, it's placed into the free list, and merged with any +adjacent free blocks. + +There are two things to be aware of. + +Firstly, the free-list ordering goes `backwards', from high addresses to low +addresses. That is, except in the case where the list wraps around, +@code{free_block > free_block->next}. + +Secondly, the region's @code{first} pointer is updated to the block preceding +the block that has just been freed. This means (because the first pointer points +to the region @emph{before} the region that will be used to service an +allocation) that the next allocation will try to use the newly freed block. + +To illustrate, say we free our 32 byte allocation: + +@example +0x0100000 +- grub_mm_region_t + | first free block: 0x0100020 + | next region: NULL + | pre_size (wasted bytes to ensure alignment): 0x10 + | size: 0xffffa0 (bytes) + | +0x0100020 +- grub_mm_header_t + | next free block: 0x10fffa0 + | size: 0x6ffff cells + | magic: GRUB_MM_FREE_MAGIC +0x0100040 | -- + | free space belonging to the block + | +0x0efffe0 +- grub_mm_header_t + | size: 0x8001 cells (1MB + header cell) + | magic: GRUB_MM_ALLOC_MAGIC +0x0f00000 | -- + | allocated space + | +0x1000000 +- grub_mm_header_t + | next free block: 0x0100020 + | size: 0x77fc cells + | magic: GRUB_MM_FREE_MAGIC +0x1000020 | -- + | free space belonging to block + | +0x10eff80 +- grub_mm_header_t + | size: 0x801 cells (64kB + header cell) + | magic: GRUB_MM_ALLOC_MAGIC +0x10effa0 | -- + | allocated space + | +0x10fffa0 +- grub_mm_header_t + | next free block: 0x1000000 + | size: 0x2 cells (32 bytes + header cell) + | magic: GRUB_MM_FREE_MAGIC +0x10fffc0 | -- + | free space belonging to block + | +0x10fffe0 +- End of region as seen by grub +@end example + +The final block is now freed; it is inserted in the ring such that each region's +next is at a lower address that the previous free region, modulo wrapping. The +region's @code{first} is set such that @code{*first->next} is the block we just +freed. + +@subsection Memory pressure and large allocations + +The largest allocation that can be serviced depends on the platform and heap +fragmentation. + +If an allocation cannot be serviced, grub will attempt to invalidate disk caches +to free up memory, and try again. + +Grub does not implement virtual memory or paging: there is no way for it to +stitch together various dis-contiguous pages from firmware into a contiguous +address space. + +Grub always searches regions in order: if you are trying to allocate, for +example, a number of 64kB objects and your region list starts with small regions +with numerous free blocks that are all too small to satisfy the allocation, +@code{grub_memalign ()} will have to check each of those regions every time +before finally getting to a region that can satisfy the allocation. + +@subsection Summary: the life-cycle of an allocation from the heap + +@itemize @bullet + +@item A heap region is added in grub initialisation by calling + @code{grub_mm_init_region ()}. The region is marked as a single free + block. It is inserted into the list of regions. + +@item A call to @code{grub_malloc ()} is made, seeking a certain amount of + memory. + +@item This is translated into a call to @code{grub_memalign ()} with no + alignment requirement. + +@item For each region from first to last: + + @itemize + + @item Call @code{grub_real_malloc ()} with that region to try to satisfy the + allocation. + + @item Starting from the region's @code{*first->next}, which is usually the + most recently freed block, try each block in the free ring. + + @item If it is large enough to satisfy the size and alignment requirements, + use it. Carve out an appropriate block, mark it as allocated, and adjust + the size and shape of the free space as required. + + @item If we found a block, and we allocated less than 32kB, or we allocated + the entire free block, mark the previous block as the first free block. + + @item Return the block we found. If we cannot find an appropriate block, + return NULL. + +@end itemize + +@item The caller uses their block of memory. + +@item A call to @code{grub_free ()} is made to release the memory back to + grub. The block is reinserted into the free list, merging with adjacent + free blocks if possible. The region's @code{first} pointer is set such + that @code{*first->next} is the most recently freed region. + +@end itemize + +@section Another memory user: the relocator + +The grub mm code is not the only piece of code that uses memory from +firmware. Grub also a `relocator' infrastructure that can be used to find +appropriate places in memory to lay out boot material. This code talks to the +platform specific code to iterate over available memory from firmware, but +rather than adding it to the heap as a region, the relocator uses it directly. + +@section Stack and heap sizes On emu stack and heap are just normal host OS stack and heap. Stack is typically 8 MiB although it's OS-dependent. @@ -1088,8 +1482,7 @@ In short: @end multitable -@node BIOS port memory map -@chapter BIOS port memory map +@section BIOS port memory map @c By Yoshinori K Okuji @multitable @columnfractions .15 .25 .5 -- 2.32.0 _______________________________________________ Grub-devel mailing list Grub-devel@gnu.org https://lists.gnu.org/mailman/listinfo/grub-devel