On 1/8/26 08:03, Zi Yan wrote:
> On 7 Jan 2026, at 16:15, Matthew Brost wrote:
> 
>> On Wed, Jan 07, 2026 at 03:38:35PM -0500, Zi Yan wrote:
>>> On 7 Jan 2026, at 15:20, Zi Yan wrote:
>>>
>>>> +THP folks
>>>
>>> +willy, since he commented in another thread.
>>>
>>>>
>>>> On 16 Dec 2025, at 15:10, Francois Dugast wrote:
>>>>
>>>>> From: Matthew Brost <[email protected]>
>>>>>
>>>>> Introduce migrate_device_split_page() to split a device page into
>>>>> lower-order pages. Used when a folio allocated as higher-order is freed
>>>>> and later reallocated at a smaller order by the driver memory manager.
>>>>>
>>>>> Cc: Andrew Morton <[email protected]>
>>>>> Cc: Balbir Singh <[email protected]>
>>>>> Cc: [email protected]
>>>>> Cc: [email protected]
>>>>> Signed-off-by: Matthew Brost <[email protected]>
>>>>> Signed-off-by: Francois Dugast <[email protected]>
>>>>> ---
>>>>>  include/linux/huge_mm.h |  3 +++
>>>>>  include/linux/migrate.h |  1 +
>>>>>  mm/huge_memory.c        |  6 ++---
>>>>>  mm/migrate_device.c     | 49 +++++++++++++++++++++++++++++++++++++++++
>>>>>  4 files changed, 56 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>>>> index a4d9f964dfde..6ad8f359bc0d 100644
>>>>> --- a/include/linux/huge_mm.h
>>>>> +++ b/include/linux/huge_mm.h
>>>>> @@ -374,6 +374,9 @@ int __split_huge_page_to_list_to_order(struct page 
>>>>> *page, struct list_head *list
>>>>>  int folio_split_unmapped(struct folio *folio, unsigned int new_order);
>>>>>  unsigned int min_order_for_split(struct folio *folio);
>>>>>  int split_folio_to_list(struct folio *folio, struct list_head *list);
>>>>> +int __split_unmapped_folio(struct folio *folio, int new_order,
>>>>> +                    struct page *split_at, struct xa_state *xas,
>>>>> +                    struct address_space *mapping, enum split_type 
>>>>> split_type);
>>>>>  int folio_check_splittable(struct folio *folio, unsigned int new_order,
>>>>>                      enum split_type split_type);
>>>>>  int folio_split(struct folio *folio, unsigned int new_order, struct page 
>>>>> *page,
>>>>> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
>>>>> index 26ca00c325d9..ec65e4fd5f88 100644
>>>>> --- a/include/linux/migrate.h
>>>>> +++ b/include/linux/migrate.h
>>>>> @@ -192,6 +192,7 @@ void migrate_device_pages(unsigned long *src_pfns, 
>>>>> unsigned long *dst_pfns,
>>>>>                   unsigned long npages);
>>>>>  void migrate_device_finalize(unsigned long *src_pfns,
>>>>>                   unsigned long *dst_pfns, unsigned long npages);
>>>>> +int migrate_device_split_page(struct page *page);
>>>>>
>>>>>  #endif /* CONFIG_MIGRATION */
>>>>>
>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>>> index 40cf59301c21..7ded35a3ecec 100644
>>>>> --- a/mm/huge_memory.c
>>>>> +++ b/mm/huge_memory.c
>>>>> @@ -3621,9 +3621,9 @@ static void __split_folio_to_order(struct folio 
>>>>> *folio, int old_order,
>>>>>   * Return: 0 - successful, <0 - failed (if -ENOMEM is returned, @folio 
>>>>> might be
>>>>>   * split but not to @new_order, the caller needs to check)
>>>>>   */
>>>>> -static int __split_unmapped_folio(struct folio *folio, int new_order,
>>>>> -         struct page *split_at, struct xa_state *xas,
>>>>> -         struct address_space *mapping, enum split_type split_type)
>>>>> +int __split_unmapped_folio(struct folio *folio, int new_order,
>>>>> +                    struct page *split_at, struct xa_state *xas,
>>>>> +                    struct address_space *mapping, enum split_type 
>>>>> split_type)
>>>>>  {
>>>>>   const bool is_anon = folio_test_anon(folio);
>>>>>   int old_order = folio_order(folio);
>>>>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
>>>>> index 23379663b1e1..eb0f0e938947 100644
>>>>> --- a/mm/migrate_device.c
>>>>> +++ b/mm/migrate_device.c
>>>>> @@ -775,6 +775,49 @@ int migrate_vma_setup(struct migrate_vma *args)
>>>>>  EXPORT_SYMBOL(migrate_vma_setup);
>>>>>
>>>>>  #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
>>>>> +/**
>>>>> + * migrate_device_split_page() - Split device page
>>>>> + * @page: Device page to split
>>>>> + *
>>>>> + * Splits a device page into smaller pages. Typically called when 
>>>>> reallocating a
>>>>> + * folio to a smaller size. Inherently racy—only safe if the caller 
>>>>> ensures
>>>>> + * mutual exclusion within the page's folio (i.e., no other threads are 
>>>>> using
>>>>> + * pages within the folio). Expected to be called a free device page and
>>>>> + * restores all split out pages to a free state.
>>>>> + */
>>>
>>> Do you mind explaining why __split_unmapped_folio() is needed for a free 
>>> device
>>> page? A free page is not supposed to be a large folio, at least from a core
>>> MM point of view. __split_unmapped_folio() is intended to work on large 
>>> folios
>>> (or compound pages), even if the input folio has refcount == 0 (because it 
>>> is
>>> frozen).
>>>
>>
>> Well, then maybe this is a bug in core MM where the freed page is still
>> a THP. Let me explain the scenario and why this is needed from my POV.
>>
>> Our VRAM allocator in Xe (and several other DRM drivers) is DRM buddy.
>> This is a shared pool between traditional DRM GEMs (buffer objects) and
>> SVM allocations (pages). It doesn’t have any view of the page backing—it
>> basically just hands back a pointer to VRAM space that we allocate from.
>> From that, if it’s an SVM allocation, we can derive the device pages.
>>
>> What I see happening is: a 2M buddy allocation occurs, we make the
>> backing device pages a large folio, and sometime later the folio
>> refcount goes to zero and we free the buddy allocation. Later, the buddy
>> allocation is reused for a smaller allocation (e.g., 4K or 64K), but the
>> backing pages are still a large folio. Here is where we need to split
> 
> I agree with you that it might be a bug in free_zone_device_folio() based
> on my understanding. Since zone_device_page_init() calls prep_compound_page()
> for >0 orders, but free_zone_device_folio() never reverse the process.
> 
> Balbir and Alistair might be able to help here.

I agree it's an API limitation

> 
> I cherry picked the code from __free_frozen_pages() to reverse the process.
> Can you give it a try to see if it solve the above issue? Thanks.
> 
> From 3aa03baa39b7e62ea079e826de6ed5aab3061e46 Mon Sep 17 00:00:00 2001
> From: Zi Yan <[email protected]>
> Date: Wed, 7 Jan 2026 16:49:52 -0500
> Subject: [PATCH] mm/memremap: free device private folio fix
> Content-Type: text/plain; charset="utf-8"
> 
> Signed-off-by: Zi Yan <[email protected]>
> ---
>  mm/memremap.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/mm/memremap.c b/mm/memremap.c
> index 63c6ab4fdf08..483666ff7271 100644
> --- a/mm/memremap.c
> +++ b/mm/memremap.c
> @@ -475,6 +475,21 @@ void free_zone_device_folio(struct folio *folio)
>               pgmap->ops->folio_free(folio);
>               break;
>       }
> +
> +     if (nr > 1) {
> +             struct page *head = folio_page(folio, 0);
> +
> +             head[1].flags.f &= ~PAGE_FLAGS_SECOND;
> +#ifdef NR_PAGES_IN_LARGE_FOLIO
> +             folio->_nr_pages = 0;
> +#endif
> +             for (i = 1; i < nr; i++) {
> +                     (head + i)->mapping = NULL;
> +                     clear_compound_head(head + i);

I see that your skipping the checks in free_page_tail_prepare()? IIUC, we 
should be able
to invoke it even for zone device private pages

> +             }
> +             folio->mapping = NULL;

This is already done in free_zone_device_folio()

> +             head->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP;

I don't think this is required for zone device private folios, but I suppose it
keeps the code generic

> +     }
>  }
> 
>  void zone_device_page_init(struct page *page, unsigned int order)


Otherwise, it seems like the right way to solve the issue.

Balbir

Reply via email to