Jane Chu wrote: > Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine poison > granularity") changed nfit_handle_mce() callback to report badrange for > each poison at an alignment indicated by 1ULL << MCI_MISC_ADDR_LSB(mce->misc) > instead of the hardcoded L1_CACHE_BYTES. However recently on a server > populated with Intel DCPMEM v2 dimms, it appears that > 1UL << MCI_MISC_ADDR_LSB(mce->misc) turns out is 4KiB, or 8 512-byte blocks. > Consequently, injecting 2 back-to-back poisons via ndctl, and it reports > 8 poisons. > > [29076.590281] {3}[Hardware Error]: physical_address: 0x00000040a0602400 > [..] > [29076.619447] Memory failure: 0x40a0602: recovery action for dax page: > Recovered > [29076.627519] mce: [Hardware Error]: Machine check events logged > [29076.634033] nfit ACPI0012:00: addr in SPA 1 (0x4080000000, 0x1f80000000) > [29076.648805] nd_bus ndbus0: XXX nvdimm_bus_add_badrange: (0x40a0602000, > 0x1000) > [..] > [29078.634817] {4}[Hardware Error]: physical_address: 0x00000040a0602600 > [..] > [29079.595327] nfit ACPI0012:00: addr in SPA 1 (0x4080000000, 0x1f80000000) > [29079.610106] nd_bus ndbus0: XXX nvdimm_bus_add_badrange: (0x40a0602000, > 0x1000) > [..] > { > "dev":"namespace0.0", > "mode":"fsdax", > "map":"dev", > "size":33820770304, > "uuid":"a1b0f07f-747f-40a8-bcd4-de1560a1ef75", > "sector_size":512, > "align":2097152, > "blockdev":"pmem0", > "badblock_count":8, > "badblocks":[ > { > "offset":8208, > "length":8, > "dimms":[ > "nmem0" > ] > } > ] > } > > So, 1UL << MCI_MISC_ADDR_LSB(mce->misc) is an unreliable indicator for poison > radius and shouldn't be used. More over, as each injected poison is being > reported independently, any alignment under 512-byte appear works: > L1_CACHE_BYTES (though inaccurate), or 256-bytes (as ars->length reports), > or 512-byte. > > To get around this issue, 512-bytes is chosen as the alignment because > a. it happens to be the badblock granularity, > b. ndctl inject-error cannot inject more than one poison to a 512-byte > block, > c. architecture agnostic
I am failing to see the kernel bug? Yes, you injected less than 8 "badblocks" of poison and the hardware reported 8 blocks of poison, but that's not the kernel's fault, that's the hardware. What happens when hardware really does detect 8 blocks of consective poison and this implementation decides to only record 1 at a time? It seems the fix you want is for the hardware to report the precise error bounds and that 1UL << MCI_MISC_ADDR_LSB(mce->misc) does not have that precision in this case. However, the ARS engine likely can return the precise error ranges so I think the fix is to just use the address range indicated by 1UL << MCI_MISC_ADDR_LSB(mce->misc) to filter the results from a short ARS scrub request to ask the device for the precise error list.