** Description changed: [Impact] rasdaemon does not know how to decode MCE events from various new platforms, making it difficult to interpret errors reported up from the platform. [Test Case] + On an AMD SMCA-capable system: + #!/bin/bash + modprobe mce-inject - [Fix] + EINJ=/sys/kernel/debug/mce-inject + + # See /sys/kernel/debug/mce-inject/README + + echo hw > $EINJ/flags + echo 0x9c2030000000011b > $EINJ/status + echo 0x040000035dd8bfc0 > $EINJ/addr + echo 0x0000c2030b404000 > $EINJ/synd + echo 0 > $EINJ/bank + + # Wait for MCE to appear in dmesg + sudo ras-mc-ctl --errors + There should be a new MCE event in the output: + 1 2020-04-13 19:19:55 +0000 error: Deferred error, no action required., CPU 2, bank Load Store Unit (bank=0), mcg mcgstatus=0, mci UECC, mcgcap=0x0000011c, status=0x9c2030000000011b, addr=0x35dd8bfc0, walltime=0x5e94bb5d, cpuid=0x00830f10 + + + For Skylake, I regression tested by using mce-test w/ the "corrected" test, as I'm not sure how to inject a Skylake-specific event there. + git clone https://github.com/andikleen/mce-inject + cd mce-inject + make + sudo ./mce-inject < test/corrected + sudo ras-mc-ctl --errors + No Memory errors. + + No PCIe AER errors. + + No Extlog errors. + + MCE events: + 1 2020-04-14 00:13:07 +0000 error: No Error, mcg mcgstatus=0, mci Corrected_error Error_enabled, mcgcap=0x0f000814, status=0x9400000000000000, addr=0x0000abcd, walltime=0x5e950014, cpuid=0x00050654, bank=0x00000001 + 2 2020-04-14 00:13:07 +0000 error: No Error, mcg mcgstatus=0, mci Corrected_error Error_enabled, mcgcap=0x0f000814, status=0x9400000000000000, addr=0x00001234, walltime=0x5e950014, cpu=0x00000001, cpuid=0x00050654, apicid=0x00000002, bank=0x00000002 + [Regression Risk] + The new code added should only run on the newly supported systems, so regressions should be restricted to those systems. On those systems, a bug in the decoding code could cause an issue on these systems such as a crash in rasdaemon, etc. That is mitigated by testing on those newly supported platforms.
** Changed in: rasdaemon (Ubuntu Eoan) Status: New => In Progress ** Changed in: rasdaemon (Ubuntu Bionic) Status: New => In Progress -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1871965 Title: new platform support: Intel SkyLake, AMD Scalable MCA To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/rasdaemon/+bug/1871965/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs