On 5 Jan 2026, at 21:39, Matthew Brost wrote: > On Tue, Dec 16, 2025 at 01:39:50PM -0800, Matthew Brost wrote: >> On Tue, Dec 16, 2025 at 08:34:46PM +0000, Matthew Wilcox wrote: >>> On Tue, Dec 16, 2025 at 09:10:11PM +0100, Francois Dugast wrote: >>>> + ret = __split_unmapped_folio(folio, 0, page, NULL, NULL, >>>> SPLIT_TYPE_UNIFORM); >>> >>> We're trying to get rid of uniform splits. Why do you need this to be >>> uniform? > > I looked into this bit more - we do want a uniform split here. What we > want is to split the THP into 512 4k pages here. > > Per the doc for __split_unmapped_folio: > > 3590 * @split_at: in buddy allocator like split, the folio containing > @split_at > 3591 * will be split until its order becomes @new_order. > > I think this implies some of the pages may still be a higher order which > is not desired behavior for this usage.
IIUC, this is because there is no mTHP support in device private folio yet and device private folio can only be order-0 or order-9. But after adding mTHP support, non uniform split should work, since as you said below, only 4KB or 64KB is reallocated in CPU memory. In terms of mTHP support in device private folio, how much effort will it take? Maybe add a TODO in migrate_device_split_page(), saying move to NON_UNIFORM when mTHP support is ready. > > Matt > >> >> It’s very possible we’re doing this incorrectly due to a lack of core MM >> experience. I believe Zi Yan suggested this approach (use >> __split_unmapped_folio) a while back. >> >> Let me start by explaining what we’re trying to do and see if there’s a >> better suggestion for how to accomplish it. >> >> Would SPLIT_TYPE_NON_UNIFORM split work here? Or do you have another >> suggestion on how to split the folio aside from __split_unmapped_folio? >> >> This covers the case where a GPU device page was allocated as a THP >> (e.g., we call zone_device_folio_init with an order of 9). Later, this >> page is freed/unmapped and then reallocated for a CPU VMA that is >> smaller than a THP (e.g., we’d allocate either 4KB or 64KB based on >> CPU VMA size alignment). At this point, we need to split the device >> folio so we can migrate data into 4KB device pages. >> >> Would SPLIT_TYPE_NON_UNIFORM work here? Or do you have another >> suggestion for splitting the folio aside from __split_unmapped_folio? >> >> Matt Best Regards, Yan, Zi
