hugetlb: make bootmem allocation work with KHO

sashiko-bot Wed, 27 May 2026 18:07:13 -0700

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) 
to consider:
- [High] Incorrect NUMA node extraction for huge page allocation fallback due 
to inverted `node_exact` condition, breaking HugeTLB Vmemmap Optimization (HVO).
- [Medium] Silent memory leak in `memblock_alloc_nid_user` error path due to 
inherently failing `memblock_phys_free`.
--


commit 8f512f66429ea3e879e3104adb050f66df1c4fb3
Author: Pratyush Yadav (Google) <[email protected]>

mm/hugetlb: make bootmem allocation work with KHO

This patch addresses KHO scratch size accounting issues by introducing
memblock_alloc_nid_user(). It ensures that huge pages are not allocated
from scratch areas and removes the MEMBLOCK_RSRV_KERN flag from these
allocations, preventing them from artificially inflating scratch space
requirements.

> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 571212b80835e..46f2b1bd5abe8 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3033,26 +3033,19 @@ static __init void *alloc_bootmem(struct hstate *h, 
> int nid, bool node_exact)
[ ... ]
> -             if (node_exact)
> -                     m = memblock_alloc_exact_nid_raw(huge_page_size(h),
> -                             huge_page_size(h), 0,
> -                             MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> -             else {
> -                     m = memblock_alloc_try_nid_raw(huge_page_size(h),
> -                             huge_page_size(h), 0,
> -                             MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> +             m = memblock_alloc_nid_user(huge_page_size(h), 
> huge_page_size(h),
> +                                         nid, node_exact);
> +             if (m) {
>                       /*
>                        * For pre-HVO to work correctly, pages need to be on
>                        * the list for the node they were actually allocated
>                        * from. That node may be different in the case of
> -                      * fallback by memblock_alloc_try_nid_raw. So,
> -                      * extract the actual node first.
> +                      * fallback by memblock_alloc_try_nid_raw. So, extract
> +                      * the actual node first.
>                        */
> -                     if (m)
> +                     if (node_exact)
>                               listnode = early_pfn_to_nid(PHYS_PFN(__pa(m)));

[Severity: High]
Does this extract the node under the wrong condition?

The original code extracted the node only in the fallback path (!node_exact),
since exact allocations guarantee the requested node is used. 

With this change, if node_exact is true, we redundantly extract the node,
but if node_exact is false (meaning a fallback could have happened), we skip
extracting the actual node.

Can this place fallback-allocated huge pages onto the wrong node's list,
breaking NUMA locality and pre-HVO assumptions?

[ ... ]
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 6f76a6bb96d61..8cd52d34ad6e4 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
[ ... ]
> @@ -1784,6 +1817,47 @@ void * __init memblock_alloc_try_nid_raw(
> +void * __init memblock_alloc_nid_user(phys_addr_t size, phys_addr_t align,
> +                                   int nid, bool exact_nid)
> +{
[ ... ]
> +     alloc = __memblock_alloc_range_nid(size, align, 0, 
> MEMBLOCK_ALLOC_ACCESSIBLE,
> +                                        nid, exact_nid, flags);
> +     if (!alloc)
> +             return NULL;
> +
> +     /* User memory should not be marked with RSRV_KERN. */
> +     if (memblock_reserved_clear_kern(alloc, size)) {
> +             memblock_phys_free(alloc, size);
> +             return NULL;
> +     }

[Severity: Medium]
Will this error path silently leak memory?

If memblock_reserved_clear_kern() fails, it is because
memblock_isolate_range() encountered an out-of-memory scenario when trying
to split a region and memblock_double_array() failed.

When we then call memblock_phys_free(), it eventually calls
memblock_remove_range(), which relies on the exact same
memblock_isolate_range() call to split the region before removing it.

Since the array is still full, this isolation will deterministically fail
again, and memblock_phys_free() will return without actually freeing the
memory, leaving it permanently reserved with MEMBLOCK_RSRV_KERN set.

-- 
Sashiko AI review · 
https://sashiko.dev/#/patchset/[email protected]?part=12

Re: [RFC PATCH 12/20] mm/hugetlb: make bootmem allocation work with KHO

Reply via email to