On 7/14/2022 6:19 PM, Dan Williams wrote: > Jane Chu wrote: >> I meant to say there would be 8 calls to the nfit_handle_mce() callback, >> one call for each poison with accurate address. >> >> Also, short ARS would find 2 poisons. >> >> I attached the console output, my annotation is prefixed with "<==". > > [29078.634817] {4}[Hardware Error]: physical_address: 0x00000040a0602600 > <== 2nd poison @ 0x600 > [29078.642200] {4}[Hardware Error]: physical_address_mask: > 0xffffffffffffff00 > > Why is nfit_handle_mce() seeing a 4K address mask when the CPER record > is seeing a 256-byte address mask?
Good question! One would think both GHES reporting and nfit_handle_mce() are consuming the same mce record... Who might know? > > Sigh, is this "firmware-first" causing the kernel to get bad information > via the native mechanisms > > I would expect that if this test was truly worried about minimizing BIOS > latency it would disable firmware-first error reporting. I wonder if > that fixes the observed problem? Could you elaborate on firmware-first error please? What are the possible consequences disabling it? and how to disable it? thanks! -jane