在 2025/9/18 02:59, Kyle Meyer 写道:
On Wed, Sep 17, 2025 at 06:35:14AM +0000, Fan, Shawn wrote:
My original patch for this just skipped the GHES->offline process
for huge pages. But I wasn't aware of the sysctl control. That provides
a better solution.

Tony, does that mean you're OK with using the existing sysctl interface? If
so, I'll just send a separate patch to update the sysfs-memory-page-offline
documentation and drop the rest.

Kyle,

It depends on which camp the external customer that reported this
falls into:

1) "I'm OK disabling all soft offline requests".

or the:

2) "I'd like 4K pages to still go offline if the BIOS asks, just not any huge 
pages".

Shawn: Can you please find out?


-> Prefer the 2nd option,  "4K pages still go offline if the BIOS asks, just not any 
huge pages."

OK, thank you.

Does that mean they want to avoid offlining transparent huge pages as well?

Thanks,
Kyle Meyer


Hi, Shawn,

As memory access is typically interleaved between channels. When the
per-rank threshold is exceeded, soft-offlining the last accessed address
seems unreasonable - regardless of whether it's a 4KB page or a huge
page. The error accumulation happens at the rank level, but the action
is taken on a specific page that happened to trigger the threshold,
which doesn't address the underlying issue.

I prefer the first option that disabling all soft offline requests from
GHES driver.

Thanks.
Shuai

Reply via email to