On 9/25/25 20:11, David Hildenbrand wrote: > On 16.09.25 14:21, Balbir Singh wrote: >> Implement CPU fault handling for zone device THP entries through >> do_huge_pmd_device_private(), enabling transparent migration of >> device-private large pages back to system memory on CPU access. >> >> When the CPU accesses a zone device THP entry, the fault handler calls the >> device driver's migrate_to_ram() callback to migrate the entire large page >> back to system memory. >> >> Signed-off-by: Balbir Singh <[email protected]> >> Cc: David Hildenbrand <[email protected]> >> Cc: Zi Yan <[email protected]> >> Cc: Joshua Hahn <[email protected]> >> Cc: Rakie Kim <[email protected]> >> Cc: Byungchul Park <[email protected]> >> Cc: Gregory Price <[email protected]> >> Cc: Ying Huang <[email protected]> >> Cc: Alistair Popple <[email protected]> >> Cc: Oscar Salvador <[email protected]> >> Cc: Lorenzo Stoakes <[email protected]> >> Cc: Baolin Wang <[email protected]> >> Cc: "Liam R. Howlett" <[email protected]> >> Cc: Nico Pache <[email protected]> >> Cc: Ryan Roberts <[email protected]> >> Cc: Dev Jain <[email protected]> >> Cc: Barry Song <[email protected]> >> Cc: Lyude Paul <[email protected]> >> Cc: Danilo Krummrich <[email protected]> >> Cc: David Airlie <[email protected]> >> Cc: Simona Vetter <[email protected]> >> Cc: Ralph Campbell <[email protected]> >> Cc: Mika Penttilä <[email protected]> >> Cc: Matthew Brost <[email protected]> >> Cc: Francois Dugast <[email protected]> >> --- >> include/linux/huge_mm.h | 7 +++++++ >> mm/huge_memory.c | 36 ++++++++++++++++++++++++++++++++++++ >> mm/memory.c | 5 +++-- >> 3 files changed, 46 insertions(+), 2 deletions(-) >> >> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >> index f327d62fc985..2d669be7f1c8 100644 >> --- a/include/linux/huge_mm.h >> +++ b/include/linux/huge_mm.h >> @@ -496,6 +496,8 @@ static inline bool folio_test_pmd_mappable(struct folio >> *folio) >> vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf); >> +vm_fault_t do_huge_pmd_device_private(struct vm_fault *vmf); >> + >> extern struct folio *huge_zero_folio; >> extern unsigned long huge_zero_pfn; >> @@ -671,6 +673,11 @@ static inline vm_fault_t do_huge_pmd_numa_page(struct >> vm_fault *vmf) >> return 0; >> } >> +static inline vm_fault_t do_huge_pmd_device_private(struct vm_fault *vmf) >> +{ >> + return 0; >> +} >> + >> static inline bool is_huge_zero_folio(const struct folio *folio) >> { >> return false; >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index 5291ee155a02..90a1939455dd 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -1287,6 +1287,42 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct >> vm_fault *vmf) >> } >> +vm_fault_t do_huge_pmd_device_private(struct vm_fault *vmf) >> +{ >> + struct vm_area_struct *vma = vmf->vma; >> + vm_fault_t ret = 0; >> + spinlock_t *ptl; >> + swp_entry_t swp_entry; >> + struct page *page; >> + >> + if (vmf->flags & FAULT_FLAG_VMA_LOCK) { >> + vma_end_read(vma); >> + return VM_FAULT_RETRY; >> + } >> + >> + ptl = pmd_lock(vma->vm_mm, vmf->pmd); >> + if (unlikely(!pmd_same(*vmf->pmd, vmf->orig_pmd))) { >> + spin_unlock(ptl); >> + return 0; >> + } >> + >> + swp_entry = pmd_to_swp_entry(vmf->orig_pmd); >> + page = pfn_swap_entry_to_page(swp_entry); >> + vmf->page = page; >> + vmf->pte = NULL; >> + if (trylock_page(vmf->page)) { > > We should be operating on a folio here. folio_trylock() + folio_get() + > folio_unlock() + folio_put(). > >> + get_page(page); >> + spin_unlock(ptl); >> + ret = page_pgmap(page)->ops->migrate_to_ram(vmf); > > BTW, I was wondering whether it is really the right design to pass the vmf > here. Likely the const vma+addr+folio could be sufficient. I did not look > into all callbaks, though. >
The vmf is used for address and other bits. FYI, this is no different from pte fault handling and migrate_to_ram(). I can do the folio conversions Balbir
