On 2026/5/27 22:06, Breno Leitao wrote: > get_any_page() collapses every HWPoisonHandlable() rejection into a > single -EIO via the __get_hwpoison_page() -> -EBUSY -> shake_page() > -> retry path. That is correct for the transient case (a userspace > folio briefly off LRU during migration or compaction, which a later > shake can drag back), but wrong for stable kernel-owned pages: slab, > page-table, large-kmalloc and PG_reserved pages will never become > HWPoisonHandlable(), so the retry loop is wasted work and the final > -EIO loses the "this is structurally unrecoverable" information. > memory_failure() then maps -EIO into MF_MSG_GET_HWPOISON, which the > panic-on-unrecoverable sysctl deliberately does not act on. > > Introduce HWPoisonKernelOwned(), a small predicate that positively > identifies pages the hwpoison handler cannot recover from: > > HWPoisonKernelOwned(p, flags) := > !(MF_SOFT_OFFLINE && page_has_movable_ops(p)) && > (PageReserved(p) || PageSlab(p) || > PageTable(p) || PageLargeKmalloc(p)) > > The MF_SOFT_OFFLINE / page_has_movable_ops() opt-out mirrors the > same exception in HWPoisonHandlable(): soft-offline is allowed to > migrate movable_ops pages even though they are not on the LRU, and > we must not pre-empt that with an unrecoverable verdict. > > The list is intentionally not exhaustive. vmalloc and kernel-stack > pages, for example, do not carry a page_type bit and would need a > different oracle; they keep going through the existing retry path > unchanged. This is the smallest set we can identify with certainty > by page type. > > Wire the helper into the top of get_any_page() to short-circuit > those pages before the retry loop runs. On a hit, drop the caller's > MF_COUNT_INCREASED reference (if any) and return -ENOTRECOVERABLE > straight away. Pages outside the helper's positive list still take > the existing retry path and return -EIO, leaving operator-visible > behaviour for those cases unchanged. > > Extend the unhandlable-page pr_err() to fire for either errno and > update the get_hwpoison_page() kerneldoc to document the new return. > > memory_failure() still folds every negative return into > MF_MSG_GET_HWPOISON via its existing "else if (res < 0)" branch, so > this patch on its own only changes the errno that soft_offline_page() > can propagate to its callers. A follow-up wires -ENOTRECOVERABLE > through memory_failure() and reports MF_MSG_KERNEL for the > unrecoverable cases, which is what the > panic_on_unrecoverable_memory_failure sysctl observes.
Thanks for your patch. > > Suggested-by: David Hildenbrand <[email protected]> > Suggested-by: Lance Yang <[email protected]> > Signed-off-by: Breno Leitao <[email protected]> > --- > mm/memory-failure.c | 42 ++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 40 insertions(+), 2 deletions(-) > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index f4d3e6e20e13..8f63bdfeff8f 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -1325,6 +1325,28 @@ static inline bool HWPoisonHandlable(struct page > *page, unsigned long flags) > return PageLRU(page) || is_free_buddy_page(page); > } > > +/* > + * Positive identification of pages the hwpoison handler cannot recover. > + * These page types are owned by kernel internals (no userspace mapping > + * to unmap, no file mapping to invalidate, no migration target), so the > + * shake_page() / retry loop in get_any_page() can never turn them into > + * something HWPoisonHandlable() will accept. Short-circuit them to > + * -ENOTRECOVERABLE so callers can panic on operator request instead of > + * spinning through retries that exit as a transient-looking -EIO. > + * > + * The MF_SOFT_OFFLINE / page_has_movable_ops() opt-out mirrors > + * HWPoisonHandlable(): soft-offline is allowed to migrate movable_ops > + * pages even though they are not on the LRU. > + */ > +static inline bool HWPoisonKernelOwned(struct page *page, unsigned long > flags) > +{ > + if ((flags & MF_SOFT_OFFLINE) && page_has_movable_ops(page)) > + return false; > + > + return PageReserved(page) || PageSlab(page) || Once shake_page finds a lightweight range-based way to shrink slab, slab pages could be freed into buddy and above PageSlab test should be removed then. Maybe add a TODO or XXX here? > + PageTable(page) || PageLargeKmalloc(page); I'm not sure but is it safe or a common way to test PageReserved, PageSlab, PageTable and PageLargeKmalloc without extra page refcnt? Apart from the above nits, this patch looks good to me. Thanks. .
