>> > Reported-by: Shawn Fan <shawn....@intel.com>
>> 
>> Interesting.  What did Shawn report? (Closes:!).
>
> Tony or Shawn, could you please point me to the original report? Thanks!

Original report is internal to Intel, so no useful link for the community (but
I still wanted to give credit).

Recap of original problem is that some BIOS keep track of error threshold
per-rank and use this GHES mechanism to report threshold exceeded on
the rank.

Systems that stay up a long time can accumulate enough soft errors
to trigger this threshold. But the action of taking a page offline isn't
going to help. For a 4K page this is merely annoying. For 1G page
it can mess things up badly.

My original patch for this just skipped the GHES->offline process
for huge pages. But I wasn't aware of the sysctl control. That provides
a better solution.

-Tony

Reply via email to