On Tue, Jun 9, 2026 at 3:26 AM David Hildenbrand (Arm) <[email protected]> wrote: > > On 6/9/26 11:06, Nico Pache wrote: > > On Mon, Jun 8, 2026 at 8:57 AM David Hildenbrand (Arm) <[email protected]> > > wrote: > >> > >> On 6/6/26 12:28, Lance Yang wrote: > >>> > >>> > >>> Looks broken for swap PTEs in PMD collapse ... > >>> > >>> collapse_scan_pmd() allows them up to max_ptes_swap and record them in > >>> unmapped, but they don't get a bit in mthp_present_ptes. And then > >>> mthp_collapse() does the check above: > >> > >> Right. I assumed this is implicitly handled by the optimization in > >> collapse_scan_pmd: > >> > >> if (enabled_orders != BIT(HPAGE_PMD_ORDER)) > >> max_ptes_none = KHUGEPAGED_MAX_PTES_LIMIT; > >> > >> But we perform the check a second time. > >> > >>> > >>> nr_occupied_ptes >= nr_ptes - max_ptes_none > >>> > >>> So max_ptes_none=0 + 511 present PTEs + one allowed swap PTE won't even > >>> call collapse_huge_page() for PMD order. > >>> > >>> Shouldn't we account for them in the PMD-order check? Something like: > >>> > >>> if (is_pmd_order(order)) > >>> nr_occupied_ptes += unmapped; > > > > This solution seems good for a temporary fixup. but longterm we may > > want something else. I'm still not sure how we plan on supporting > > swapin without causing creep. So I'd be ok with adding a fix for > > legacy PMD behavior until we know how to handle mTHP creep correctly. > > > >> As an alternative, we could either 1) skip the check there for > >> pmd order (as the check was already done); or 2) introduce+maintain > >> a bitmap that tracks non-present PTEs. > >> > >> @@ -1475,7 +1477,9 @@ static enum scan_result mthp_collapse(struct > >> mm_struct *mm, > >> nr_occupied_ptes = > >> bitmap_weight_from(cc->mthp_present_ptes, offset, > >> offset + nr_ptes); > >> > >> - if (nr_occupied_ptes >= nr_ptes - max_ptes_none) { > >> + /* Check was already done in the caller. */ > >> + if (is_pmd_order(order) || > >> + nr_occupied_ptes >= nr_ptes - max_ptes_none) { > >> enum scan_result ret; > >> > >> collapse_address = address + offset * PAGE_SIZE; > >> > >> 2) would probably be cleanest long-term. > > > > That would be best for future swapin support in mTHP, but I still > > don't think it solves the creep issue. > > It wouldn't, we'd simply maintain the state we collect + rely on in separate > bitmaps. On swapin, we'd have to update/refresh bitmaps I guess.
Yeah, I'm saying for the future, it obviously solves this issue here as well, but if we have positional tracking of the swapout, shared, and none PTEs, I think we can use this to determine whether the collapse would lead to creep. If we detect creep would happen it may be best to automatically collapse to the N+1 (or greater) candidate. Just thinking outloud here. > > > Perhaps we could combine the > > two bitmaps to determine if it would make the future collapse eligible > > again? Not sure but ill start thinking about it. > > > > Should I send a fixup for this using Lance's solution? Or does Lance > > want to send a patch out with the fixes tag? > > If Lance could send a fixup, explaining the situation, that would be nice. OK, I'd appreciate that :) Cheers, -- Nico > > -- > Cheers, > > David >
