memory-failure: surface unhandlable kernel pages as -ENOTRECOVERABLE

Breno Leitao Tue, 09 Jun 2026 04:09:01 -0700

get_any_page() collapses every HWPoisonHandlable() rejection into a
single -EIO via the __get_hwpoison_page() -> -EBUSY -> shake_page()
-> retry path.  That is correct for the transient case (a userspace
folio briefly off LRU during migration or compaction, which a later
shake can drag back), but wrong for stable kernel-owned pages: slab,
page-table, large-kmalloc and PG_reserved pages will never become
HWPoisonHandlable(), so the retry loop is wasted work and the final
-EIO loses the "this is structurally unrecoverable" information.
memory_failure() then maps -EIO into MF_MSG_GET_HWPOISON, which the
panic-on-unrecoverable sysctl deliberately does not act on.


Introduce HWPoisonKernelOwned(), a small predicate that positively
identifies pages the hwpoison handler cannot recover from:

  HWPoisonKernelOwned(p, flags) :=
      !(MF_SOFT_OFFLINE && page_has_movable_ops(p)) &&
      (PageReserved(p) ||
       PageSlab(head) || PageTable(head) || PageLargeKmalloc(head))

  where head = compound_head(p).

PG_reserved is a per-page flag (PF_NO_COMPOUND) and is tested on the
page directly.  The slab, page-table and large-kmalloc page-type bits
are only stored on the head page, so those tests resolve the compound
head first, then re-read compound_head(page) afterwards: a concurrent
split or compound free that moves head invalidates the just-read flags
and the loop retries.  The lookup still takes no refcount, mirroring
the rest of get_any_page(); the recheck closes the common split race,
and a residual free->alloc->free in the same window can only mis-tag
a genuinely poisoned page, never reclassify a handlable one.

The MF_SOFT_OFFLINE / page_has_movable_ops() opt-out mirrors the
same exception in HWPoisonHandlable(): soft-offline is allowed to
migrate movable_ops pages even though they are not on the LRU, and
we must not pre-empt that with an unrecoverable verdict.

The list is intentionally not exhaustive.  vmalloc and kernel-stack
pages, for example, do not carry a page_type bit and would need a
different oracle; they keep going through the existing retry path
unchanged.  This is the smallest set we can identify with certainty
by page type.

Wire the helper into the top of get_any_page() to short-circuit
those pages before the retry loop runs.  On a hit, drop the caller's
MF_COUNT_INCREASED reference (if any) and return -ENOTRECOVERABLE
straight away.  Pages outside the helper's positive list still take
the existing retry path and return -EIO, leaving operator-visible
behaviour for those cases unchanged.

Extend the unhandlable-page pr_err() to fire for either errno and
update the get_hwpoison_page() kerneldoc to document the new return.

memory_failure() still folds every negative return into
MF_MSG_GET_HWPOISON via its existing "else if (res < 0)" branch, so
this patch on its own only changes the errno that soft_offline_page()
can propagate to its callers.  A follow-up wires -ENOTRECOVERABLE
through memory_failure() and reports MF_MSG_KERNEL for the
unrecoverable cases, which is what the
panic_on_unrecoverable_memory_failure sysctl observes.

Suggested-by: David Hildenbrand <[email protected]>
Suggested-by: Lance Yang <[email protected]>
Signed-off-by: Breno Leitao <[email protected]>
---
 mm/memory-failure.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 58 insertions(+), 2 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index f4d3e6e20e13..eed9de387694 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1325,6 +1325,46 @@ static inline bool HWPoisonHandlable(struct page *page, 
unsigned long flags)
        return PageLRU(page) || is_free_buddy_page(page);
 }
 
+/*
+ * Positive identification of pages the hwpoison handler cannot recover.
+ * These page types are owned by kernel internals (no userspace mapping
+ * to unmap, no file mapping to invalidate, no migration target), so the
+ * shake_page() / retry loop in get_any_page() can never turn them into
+ * something HWPoisonHandlable() will accept.  Short-circuit them to
+ * -ENOTRECOVERABLE so callers can panic on operator request instead of
+ * spinning through retries that exit as a transient-looking -EIO.
+ *
+ * The MF_SOFT_OFFLINE / page_has_movable_ops() opt-out mirrors
+ * HWPoisonHandlable(): soft-offline is allowed to migrate movable_ops
+ * pages even though they are not on the LRU.
+ */
+static inline bool HWPoisonKernelOwned(struct page *page, unsigned long flags)
+{
+       struct page *head;
+
+       if ((flags & MF_SOFT_OFFLINE) && page_has_movable_ops(page))
+               return false;
+
+       /* PG_reserved is a per-page flag, never set on a compound page. */
+       if (PageReserved(page))
+               return true;
+
+       /*
+        * Page-type bits live only on the head page, so resolve any tail
+        * first.  The check takes no refcount; recheck the head afterwards
+        * so a concurrent split or compound free cannot leave us trusting
+        * a stale view.  A free->alloc->free in the same window is still
+        * possible but closing it would require taking a reference here.
+        */
+retry:
+       head = compound_head(page);
+       if (!(PageSlab(head) || PageTable(head) || PageLargeKmalloc(head)))
+               return false;
+       if (head != compound_head(page))
+               goto retry;
+       return true;
+}
+
 static int __get_hwpoison_page(struct page *page, unsigned long flags)
 {
        struct folio *folio = page_folio(page);
@@ -1371,6 +1411,19 @@ static int get_any_page(struct page *p, unsigned long 
flags)
        if (flags & MF_COUNT_INCREASED)
                count_increased = true;
 
+       /*
+        * Page types we know are kernel-owned and cannot be recovered.
+        * Short-circuit before the shake_page() / retry loop, which
+        * cannot turn any of these into something HWPoisonHandlable().
+        * Drop the caller's reference if MF_COUNT_INCREASED took one.
+        */
+       if (HWPoisonKernelOwned(p, flags)) {
+               if (count_increased)
+                       put_page(p);
+               ret = -ENOTRECOVERABLE;
+               goto out;
+       }
+
 try_again:
        if (!count_increased) {
                ret = __get_hwpoison_page(p, flags);
@@ -1418,7 +1471,7 @@ static int get_any_page(struct page *p, unsigned long 
flags)
                ret = -EIO;
        }
 out:
-       if (ret == -EIO)
+       if (ret == -EIO || ret == -ENOTRECOVERABLE)
                pr_err("%#lx: unhandlable page.\n", page_to_pfn(p));
 
        return ret;
@@ -1475,7 +1528,10 @@ static int __get_unpoison_page(struct page *page)
  *         -EIO for pages on which we can not handle memory errors,
  *         -EBUSY when get_hwpoison_page() has raced with page lifecycle
  *         operations like allocation and free,
- *         -EHWPOISON when the page is hwpoisoned and taken off from buddy.
+ *         -EHWPOISON when the page is hwpoisoned and taken off from buddy,
+ *         -ENOTRECOVERABLE for kernel-owned pages identified by
+ *         HWPoisonKernelOwned() (PG_reserved, slab,
+ *         page-table, large-kmalloc) that the handler cannot recover.
  */
 static int get_hwpoison_page(struct page *p, unsigned long flags)
 {

-- 
2.53.0-Meta

[PATCH v9 2/6] mm/memory-failure: surface unhandlable kernel pages as -ENOTRECOVERABLE

Reply via email to