On 2026/6/9 17:32, Nico Pache wrote:
On Tue, Jun 9, 2026 at 3:26 AM David Hildenbrand (Arm) <[email protected]> wrote:

On 6/9/26 11:06, Nico Pache wrote:
On Mon, Jun 8, 2026 at 8:57 AM David Hildenbrand (Arm) <[email protected]> wrote:

On 6/6/26 12:28, Lance Yang wrote:


Looks broken for swap PTEs in PMD collapse ...

collapse_scan_pmd() allows them up to max_ptes_swap and record them in
unmapped, but they don't get a bit in mthp_present_ptes. And then
mthp_collapse() does the check above:

Right. I assumed this is implicitly handled by the optimization in 
collapse_scan_pmd:

         if (enabled_orders != BIT(HPAGE_PMD_ORDER))
                 max_ptes_none = KHUGEPAGED_MAX_PTES_LIMIT;

But we perform the check a second time.


nr_occupied_ptes >= nr_ptes - max_ptes_none

So max_ptes_none=0 + 511 present PTEs + one allowed swap PTE won't even
call collapse_huge_page() for PMD order.

Shouldn't we account for them in the PMD-order check? Something like:

if (is_pmd_order(order))
       nr_occupied_ptes += unmapped;

This solution seems good for a temporary fixup. but longterm we may
want something else. I'm still not sure how we plan on supporting
swapin without causing creep. So I'd be ok with adding a fix for
legacy PMD behavior until we know how to handle mTHP creep correctly.

As an alternative, we could either 1) skip the check there for
pmd order (as the check was already done); or 2) introduce+maintain
a bitmap that tracks non-present PTEs.

@@ -1475,7 +1477,9 @@ static enum scan_result mthp_collapse(struct mm_struct 
*mm,
                 nr_occupied_ptes = bitmap_weight_from(cc->mthp_present_ptes, 
offset,
                                                       offset + nr_ptes);

-               if (nr_occupied_ptes >= nr_ptes - max_ptes_none) {
+               /* Check was already done in the caller. */
+               if (is_pmd_order(order) ||
+                   nr_occupied_ptes >= nr_ptes - max_ptes_none) {
                         enum scan_result ret;

                         collapse_address = address + offset * PAGE_SIZE;

2) would probably be cleanest long-term.

That would be best for future swapin support in mTHP, but I still
don't think it solves the creep issue.

It wouldn't, we'd simply maintain the state we collect + rely on in separate
bitmaps. On swapin, we'd have to update/refresh bitmaps I guess.

Yeah, I'm saying for the future, it obviously solves this issue here
as well, but if we have positional tracking of the swapout, shared,
and none PTEs, I think we can use this to determine whether the
collapse would lead to creep. If we detect creep would happen it may
be best to automatically collapse to the N+1 (or greater) candidate.
Just thinking outloud here.


Perhaps we could combine the
two bitmaps to determine if it would make the future collapse eligible
again? Not sure but ill start thinking about it.

Should I send a fixup for this using Lance's solution? Or does Lance
want to send a patch out with the fixes tag?

If Lance could send a fixup, explaining the situation, that would be nice.

Sure, happy to send a fixup :P

Should I send it as a fixup to be folded into this patch, or as a
separate patch with a Fixes: tag?

Will get one out soon :)

OK, I'd appreciate that :)

Cheers!


Reply via email to