migrate: Add migrate_device_split_page

Zi Yan Wed, 07 Jan 2026 14:03:49 -0800

On 7 Jan 2026, at 16:15, Matthew Brost wrote:

> On Wed, Jan 07, 2026 at 03:38:35PM -0500, Zi Yan wrote:
>> On 7 Jan 2026, at 15:20, Zi Yan wrote:
>>
>>> +THP folks
>>
>> +willy, since he commented in another thread.
>>
>>>
>>> On 16 Dec 2025, at 15:10, Francois Dugast wrote:
>>>
>>>> From: Matthew Brost <[email protected]>
>>>>
>>>> Introduce migrate_device_split_page() to split a device page into
>>>> lower-order pages. Used when a folio allocated as higher-order is freed
>>>> and later reallocated at a smaller order by the driver memory manager.
>>>>
>>>> Cc: Andrew Morton <[email protected]>
>>>> Cc: Balbir Singh <[email protected]>
>>>> Cc: [email protected]
>>>> Cc: [email protected]
>>>> Signed-off-by: Matthew Brost <[email protected]>
>>>> Signed-off-by: Francois Dugast <[email protected]>
>>>> ---
>>>>  include/linux/huge_mm.h |  3 +++
>>>>  include/linux/migrate.h |  1 +
>>>>  mm/huge_memory.c        |  6 ++---
>>>>  mm/migrate_device.c     | 49 +++++++++++++++++++++++++++++++++++++++++
>>>>  4 files changed, 56 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>>> index a4d9f964dfde..6ad8f359bc0d 100644
>>>> --- a/include/linux/huge_mm.h
>>>> +++ b/include/linux/huge_mm.h
>>>> @@ -374,6 +374,9 @@ int __split_huge_page_to_list_to_order(struct page 
>>>> *page, struct list_head *list
>>>>  int folio_split_unmapped(struct folio *folio, unsigned int new_order);
>>>>  unsigned int min_order_for_split(struct folio *folio);
>>>>  int split_folio_to_list(struct folio *folio, struct list_head *list);
>>>> +int __split_unmapped_folio(struct folio *folio, int new_order,
>>>> +                     struct page *split_at, struct xa_state *xas,
>>>> +                     struct address_space *mapping, enum split_type 
>>>> split_type);
>>>>  int folio_check_splittable(struct folio *folio, unsigned int new_order,
>>>>                       enum split_type split_type);
>>>>  int folio_split(struct folio *folio, unsigned int new_order, struct page 
>>>> *page,
>>>> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
>>>> index 26ca00c325d9..ec65e4fd5f88 100644
>>>> --- a/include/linux/migrate.h
>>>> +++ b/include/linux/migrate.h
>>>> @@ -192,6 +192,7 @@ void migrate_device_pages(unsigned long *src_pfns, 
>>>> unsigned long *dst_pfns,
>>>>                    unsigned long npages);
>>>>  void migrate_device_finalize(unsigned long *src_pfns,
>>>>                    unsigned long *dst_pfns, unsigned long npages);
>>>> +int migrate_device_split_page(struct page *page);
>>>>
>>>>  #endif /* CONFIG_MIGRATION */
>>>>
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index 40cf59301c21..7ded35a3ecec 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -3621,9 +3621,9 @@ static void __split_folio_to_order(struct folio 
>>>> *folio, int old_order,
>>>>   * Return: 0 - successful, <0 - failed (if -ENOMEM is returned, @folio 
>>>> might be
>>>>   * split but not to @new_order, the caller needs to check)
>>>>   */
>>>> -static int __split_unmapped_folio(struct folio *folio, int new_order,
>>>> -          struct page *split_at, struct xa_state *xas,
>>>> -          struct address_space *mapping, enum split_type split_type)
>>>> +int __split_unmapped_folio(struct folio *folio, int new_order,
>>>> +                     struct page *split_at, struct xa_state *xas,
>>>> +                     struct address_space *mapping, enum split_type 
>>>> split_type)
>>>>  {
>>>>    const bool is_anon = folio_test_anon(folio);
>>>>    int old_order = folio_order(folio);
>>>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
>>>> index 23379663b1e1..eb0f0e938947 100644
>>>> --- a/mm/migrate_device.c
>>>> +++ b/mm/migrate_device.c
>>>> @@ -775,6 +775,49 @@ int migrate_vma_setup(struct migrate_vma *args)
>>>>  EXPORT_SYMBOL(migrate_vma_setup);
>>>>
>>>>  #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
>>>> +/**
>>>> + * migrate_device_split_page() - Split device page
>>>> + * @page: Device page to split
>>>> + *
>>>> + * Splits a device page into smaller pages. Typically called when 
>>>> reallocating a
>>>> + * folio to a smaller size. Inherently racy—only safe if the caller 
>>>> ensures
>>>> + * mutual exclusion within the page's folio (i.e., no other threads are 
>>>> using
>>>> + * pages within the folio). Expected to be called a free device page and
>>>> + * restores all split out pages to a free state.
>>>> + */
>>
>> Do you mind explaining why __split_unmapped_folio() is needed for a free 
>> device
>> page? A free page is not supposed to be a large folio, at least from a core
>> MM point of view. __split_unmapped_folio() is intended to work on large 
>> folios
>> (or compound pages), even if the input folio has refcount == 0 (because it is
>> frozen).
>>
>
> Well, then maybe this is a bug in core MM where the freed page is still
> a THP. Let me explain the scenario and why this is needed from my POV.
>
> Our VRAM allocator in Xe (and several other DRM drivers) is DRM buddy.
> This is a shared pool between traditional DRM GEMs (buffer objects) and
> SVM allocations (pages). It doesn’t have any view of the page backing—it
> basically just hands back a pointer to VRAM space that we allocate from.
> From that, if it’s an SVM allocation, we can derive the device pages.
>
> What I see happening is: a 2M buddy allocation occurs, we make the
> backing device pages a large folio, and sometime later the folio
> refcount goes to zero and we free the buddy allocation. Later, the buddy
> allocation is reused for a smaller allocation (e.g., 4K or 64K), but the
> backing pages are still a large folio. Here is where we need to split


I agree with you that it might be a bug in free_zone_device_folio() based
on my understanding. Since zone_device_page_init() calls prep_compound_page()
for >0 orders, but free_zone_device_folio() never reverse the process.

Balbir and Alistair might be able to help here.

I cherry picked the code from __free_frozen_pages() to reverse the process.
Can you give it a try to see if it solve the above issue? Thanks.

>From 3aa03baa39b7e62ea079e826de6ed5aab3061e46 Mon Sep 17 00:00:00 2001
From: Zi Yan <[email protected]>
Date: Wed, 7 Jan 2026 16:49:52 -0500
Subject: [PATCH] mm/memremap: free device private folio fix
Content-Type: text/plain; charset="utf-8"

Signed-off-by: Zi Yan <[email protected]>
---
 mm/memremap.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/mm/memremap.c b/mm/memremap.c
index 63c6ab4fdf08..483666ff7271 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -475,6 +475,21 @@ void free_zone_device_folio(struct folio *folio)
                pgmap->ops->folio_free(folio);
                break;
        }
+
+       if (nr > 1) {
+               struct page *head = folio_page(folio, 0);
+
+               head[1].flags.f &= ~PAGE_FLAGS_SECOND;
+#ifdef NR_PAGES_IN_LARGE_FOLIO
+               folio->_nr_pages = 0;
+#endif
+               for (i = 1; i < nr; i++) {
+                       (head + i)->mapping = NULL;
+                       clear_compound_head(head + i);
+               }
+               folio->mapping = NULL;
+               head->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP;
+       }
 }

 void zone_device_page_init(struct page *page, unsigned int order)
-- 
2.51.0



> the folio into 4K pages so we can properly migrate the pages via the
> migrate_vma_* calls. Also note: if you call zone_device_page_init with
> an order of zero on a large device folio, that also blows up.
>
> Open to other ideas here for how to handle this scenario.
>
>>>> +int migrate_device_split_page(struct page *page)
>>>> +{
>>>> +  struct folio *folio = page_folio(page);
>>>> +  struct dev_pagemap *pgmap = folio->pgmap;
>>>> +  struct page *unlock_page = folio_page(folio, 0);
>>>> +  unsigned int order = folio_order(folio), i;
>>>> +  int ret = 0;
>>>> +
>>>> +  VM_BUG_ON_FOLIO(!order, folio);
>>>> +  VM_BUG_ON_FOLIO(!folio_is_device_private(folio), folio);
>>>> +  VM_BUG_ON_FOLIO(folio_ref_count(folio), folio);
>>
>> Please use VM_WARN_ON_FOLIO() instead to catch errors. There is no need to 
>> crash
>> the kernel
>>
>
> Sure.
>
>>>> +
>>>> +  folio_lock(folio);
>>>> +
>>>> +  ret = __split_unmapped_folio(folio, 0, page, NULL, NULL, 
>>>> SPLIT_TYPE_UNIFORM);
>>>> +  if (ret) {
>>>> +         /*
>>>> +          * We can't fail here unless the caller doesn't know what they
>>>> +          * are doing.
>>>> +          */
>>>> +          VM_BUG_ON_FOLIO(ret, folio);
>>
>> Same here.
>>
>
> Will do.
>
> Matt
>
>>>> +
>>>> +          return ret;
>>>> +  }
>>>> +
>>>> +  for (i = 0; i < 0x1 << order; ++i, ++unlock_page) {
>>>> +          page_folio(unlock_page)->pgmap = pgmap;
>>>> +          folio_unlock(page_folio(unlock_page));
>>>> +  }
>>>> +
>>>> +  return 0;
>>>> +}
>>>> +
>>>>  /**
>>>>   * migrate_vma_insert_huge_pmd_page: Insert a huge folio into 
>>>> @migrate->vma->vm_mm
>>>>   * at @addr. folio is already allocated as a part of the migration 
>>>> process with
>>>> @@ -927,6 +970,11 @@ static int migrate_vma_split_unmapped_folio(struct 
>>>> migrate_vma *migrate,
>>>>    return ret;
>>>>  }
>>>>  #else /* !CONFIG_ARCH_ENABLE_THP_MIGRATION */
>>>> +int migrate_device_split_page(struct page *page)
>>>> +{
>>>> +  return 0;
>>>> +}
>>>> +
>>>>  static int migrate_vma_insert_huge_pmd_page(struct migrate_vma *migrate,
>>>>                                     unsigned long addr,
>>>>                                     struct page *page,
>>>> @@ -943,6 +991,7 @@ static int migrate_vma_split_unmapped_folio(struct 
>>>> migrate_vma *migrate,
>>>>    return 0;
>>>>  }
>>>>  #endif
>>>> +EXPORT_SYMBOL(migrate_device_split_page);
>>>>
>>>>  static unsigned long migrate_vma_nr_pages(unsigned long *src)
>>>>  {
>>>> -- 
>>>> 2.43.0
>>>
>>>
>>> Best Regards,
>>> Yan, Zi
>>
>>
>> Best Regards,
>> Yan, Zi


Best Regards,
Yan, Zi

Re: [PATCH 1/4] mm/migrate: Add migrate_device_split_page

Reply via email to