On 7 Jan 2026, at 16:15, Matthew Brost wrote: > On Wed, Jan 07, 2026 at 03:38:35PM -0500, Zi Yan wrote: >> On 7 Jan 2026, at 15:20, Zi Yan wrote: >> >>> +THP folks >> >> +willy, since he commented in another thread. >> >>> >>> On 16 Dec 2025, at 15:10, Francois Dugast wrote: >>> >>>> From: Matthew Brost <[email protected]> >>>> >>>> Introduce migrate_device_split_page() to split a device page into >>>> lower-order pages. Used when a folio allocated as higher-order is freed >>>> and later reallocated at a smaller order by the driver memory manager. >>>> >>>> Cc: Andrew Morton <[email protected]> >>>> Cc: Balbir Singh <[email protected]> >>>> Cc: [email protected] >>>> Cc: [email protected] >>>> Signed-off-by: Matthew Brost <[email protected]> >>>> Signed-off-by: Francois Dugast <[email protected]> >>>> --- >>>> include/linux/huge_mm.h | 3 +++ >>>> include/linux/migrate.h | 1 + >>>> mm/huge_memory.c | 6 ++--- >>>> mm/migrate_device.c | 49 +++++++++++++++++++++++++++++++++++++++++ >>>> 4 files changed, 56 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >>>> index a4d9f964dfde..6ad8f359bc0d 100644 >>>> --- a/include/linux/huge_mm.h >>>> +++ b/include/linux/huge_mm.h >>>> @@ -374,6 +374,9 @@ int __split_huge_page_to_list_to_order(struct page >>>> *page, struct list_head *list >>>> int folio_split_unmapped(struct folio *folio, unsigned int new_order); >>>> unsigned int min_order_for_split(struct folio *folio); >>>> int split_folio_to_list(struct folio *folio, struct list_head *list); >>>> +int __split_unmapped_folio(struct folio *folio, int new_order, >>>> + struct page *split_at, struct xa_state *xas, >>>> + struct address_space *mapping, enum split_type >>>> split_type); >>>> int folio_check_splittable(struct folio *folio, unsigned int new_order, >>>> enum split_type split_type); >>>> int folio_split(struct folio *folio, unsigned int new_order, struct page >>>> *page, >>>> diff --git a/include/linux/migrate.h b/include/linux/migrate.h >>>> index 26ca00c325d9..ec65e4fd5f88 100644 >>>> --- a/include/linux/migrate.h >>>> +++ b/include/linux/migrate.h >>>> @@ -192,6 +192,7 @@ void migrate_device_pages(unsigned long *src_pfns, >>>> unsigned long *dst_pfns, >>>> unsigned long npages); >>>> void migrate_device_finalize(unsigned long *src_pfns, >>>> unsigned long *dst_pfns, unsigned long npages); >>>> +int migrate_device_split_page(struct page *page); >>>> >>>> #endif /* CONFIG_MIGRATION */ >>>> >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>>> index 40cf59301c21..7ded35a3ecec 100644 >>>> --- a/mm/huge_memory.c >>>> +++ b/mm/huge_memory.c >>>> @@ -3621,9 +3621,9 @@ static void __split_folio_to_order(struct folio >>>> *folio, int old_order, >>>> * Return: 0 - successful, <0 - failed (if -ENOMEM is returned, @folio >>>> might be >>>> * split but not to @new_order, the caller needs to check) >>>> */ >>>> -static int __split_unmapped_folio(struct folio *folio, int new_order, >>>> - struct page *split_at, struct xa_state *xas, >>>> - struct address_space *mapping, enum split_type split_type) >>>> +int __split_unmapped_folio(struct folio *folio, int new_order, >>>> + struct page *split_at, struct xa_state *xas, >>>> + struct address_space *mapping, enum split_type >>>> split_type) >>>> { >>>> const bool is_anon = folio_test_anon(folio); >>>> int old_order = folio_order(folio); >>>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c >>>> index 23379663b1e1..eb0f0e938947 100644 >>>> --- a/mm/migrate_device.c >>>> +++ b/mm/migrate_device.c >>>> @@ -775,6 +775,49 @@ int migrate_vma_setup(struct migrate_vma *args) >>>> EXPORT_SYMBOL(migrate_vma_setup); >>>> >>>> #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION >>>> +/** >>>> + * migrate_device_split_page() - Split device page >>>> + * @page: Device page to split >>>> + * >>>> + * Splits a device page into smaller pages. Typically called when >>>> reallocating a >>>> + * folio to a smaller size. Inherently racy—only safe if the caller >>>> ensures >>>> + * mutual exclusion within the page's folio (i.e., no other threads are >>>> using >>>> + * pages within the folio). Expected to be called a free device page and >>>> + * restores all split out pages to a free state. >>>> + */ >> >> Do you mind explaining why __split_unmapped_folio() is needed for a free >> device >> page? A free page is not supposed to be a large folio, at least from a core >> MM point of view. __split_unmapped_folio() is intended to work on large >> folios >> (or compound pages), even if the input folio has refcount == 0 (because it is >> frozen). >> > > Well, then maybe this is a bug in core MM where the freed page is still > a THP. Let me explain the scenario and why this is needed from my POV. > > Our VRAM allocator in Xe (and several other DRM drivers) is DRM buddy. > This is a shared pool between traditional DRM GEMs (buffer objects) and > SVM allocations (pages). It doesn’t have any view of the page backing—it > basically just hands back a pointer to VRAM space that we allocate from. > From that, if it’s an SVM allocation, we can derive the device pages. > > What I see happening is: a 2M buddy allocation occurs, we make the > backing device pages a large folio, and sometime later the folio > refcount goes to zero and we free the buddy allocation. Later, the buddy > allocation is reused for a smaller allocation (e.g., 4K or 64K), but the > backing pages are still a large folio. Here is where we need to split
I agree with you that it might be a bug in free_zone_device_folio() based on my understanding. Since zone_device_page_init() calls prep_compound_page() for >0 orders, but free_zone_device_folio() never reverse the process. Balbir and Alistair might be able to help here. I cherry picked the code from __free_frozen_pages() to reverse the process. Can you give it a try to see if it solve the above issue? Thanks. >From 3aa03baa39b7e62ea079e826de6ed5aab3061e46 Mon Sep 17 00:00:00 2001 From: Zi Yan <[email protected]> Date: Wed, 7 Jan 2026 16:49:52 -0500 Subject: [PATCH] mm/memremap: free device private folio fix Content-Type: text/plain; charset="utf-8" Signed-off-by: Zi Yan <[email protected]> --- mm/memremap.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/mm/memremap.c b/mm/memremap.c index 63c6ab4fdf08..483666ff7271 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -475,6 +475,21 @@ void free_zone_device_folio(struct folio *folio) pgmap->ops->folio_free(folio); break; } + + if (nr > 1) { + struct page *head = folio_page(folio, 0); + + head[1].flags.f &= ~PAGE_FLAGS_SECOND; +#ifdef NR_PAGES_IN_LARGE_FOLIO + folio->_nr_pages = 0; +#endif + for (i = 1; i < nr; i++) { + (head + i)->mapping = NULL; + clear_compound_head(head + i); + } + folio->mapping = NULL; + head->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; + } } void zone_device_page_init(struct page *page, unsigned int order) -- 2.51.0 > the folio into 4K pages so we can properly migrate the pages via the > migrate_vma_* calls. Also note: if you call zone_device_page_init with > an order of zero on a large device folio, that also blows up. > > Open to other ideas here for how to handle this scenario. > >>>> +int migrate_device_split_page(struct page *page) >>>> +{ >>>> + struct folio *folio = page_folio(page); >>>> + struct dev_pagemap *pgmap = folio->pgmap; >>>> + struct page *unlock_page = folio_page(folio, 0); >>>> + unsigned int order = folio_order(folio), i; >>>> + int ret = 0; >>>> + >>>> + VM_BUG_ON_FOLIO(!order, folio); >>>> + VM_BUG_ON_FOLIO(!folio_is_device_private(folio), folio); >>>> + VM_BUG_ON_FOLIO(folio_ref_count(folio), folio); >> >> Please use VM_WARN_ON_FOLIO() instead to catch errors. There is no need to >> crash >> the kernel >> > > Sure. > >>>> + >>>> + folio_lock(folio); >>>> + >>>> + ret = __split_unmapped_folio(folio, 0, page, NULL, NULL, >>>> SPLIT_TYPE_UNIFORM); >>>> + if (ret) { >>>> + /* >>>> + * We can't fail here unless the caller doesn't know what they >>>> + * are doing. >>>> + */ >>>> + VM_BUG_ON_FOLIO(ret, folio); >> >> Same here. >> > > Will do. > > Matt > >>>> + >>>> + return ret; >>>> + } >>>> + >>>> + for (i = 0; i < 0x1 << order; ++i, ++unlock_page) { >>>> + page_folio(unlock_page)->pgmap = pgmap; >>>> + folio_unlock(page_folio(unlock_page)); >>>> + } >>>> + >>>> + return 0; >>>> +} >>>> + >>>> /** >>>> * migrate_vma_insert_huge_pmd_page: Insert a huge folio into >>>> @migrate->vma->vm_mm >>>> * at @addr. folio is already allocated as a part of the migration >>>> process with >>>> @@ -927,6 +970,11 @@ static int migrate_vma_split_unmapped_folio(struct >>>> migrate_vma *migrate, >>>> return ret; >>>> } >>>> #else /* !CONFIG_ARCH_ENABLE_THP_MIGRATION */ >>>> +int migrate_device_split_page(struct page *page) >>>> +{ >>>> + return 0; >>>> +} >>>> + >>>> static int migrate_vma_insert_huge_pmd_page(struct migrate_vma *migrate, >>>> unsigned long addr, >>>> struct page *page, >>>> @@ -943,6 +991,7 @@ static int migrate_vma_split_unmapped_folio(struct >>>> migrate_vma *migrate, >>>> return 0; >>>> } >>>> #endif >>>> +EXPORT_SYMBOL(migrate_device_split_page); >>>> >>>> static unsigned long migrate_vma_nr_pages(unsigned long *src) >>>> { >>>> -- >>>> 2.43.0 >>> >>> >>> Best Regards, >>> Yan, Zi >> >> >> Best Regards, >> Yan, Zi Best Regards, Yan, Zi
