Re: [PATCH v2] mm/memory-failure: Support disabling soft offline for HugeTLB pages

Kyle Meyer Tue, 16 Sep 2025 11:00:58 -0700

On Tue, Sep 16, 2025 at 03:20:49PM +0000, Luck, Tony wrote:
> >> > Reported-by: Shawn Fan <shawn....@intel.com>
> >> 
> >> Interesting.  What did Shawn report? (Closes:!).
> >
> > Tony or Shawn, could you please point me to the original report? Thanks!
> 
> Original report is internal to Intel, so no useful link for the community (but
> I still wanted to give credit).
> 
> Recap of original problem is that some BIOS keep track of error threshold
> per-rank and use this GHES mechanism to report threshold exceeded on
> the rank.
> 
> Systems that stay up a long time can accumulate enough soft errors
> to trigger this threshold. But the action of taking a page offline isn't
> going to help. For a 4K page this is merely annoying. For 1G page
> it can mess things up badly.
> 
> My original patch for this just skipped the GHES->offline process
> for huge pages. But I wasn't aware of the sysctl control. That provides
> a better solution.


Tony, does that mean you're OK with using the existing sysctl interface? If
so, I'll just send a separate patch to update the sysfs-memory-page-offline
documentation and drop the rest.

Thanks,
Kyle Meyer

Re: [PATCH v2] mm/memory-failure: Support disabling soft offline for HugeTLB pages

Reply via email to