** Description changed:

  [Impact]
  rasdaemon does not know how to decode MCE events from various new platforms, 
making it difficult to interpret errors reported up from the platform.
  
  [Test Case]
+ On an AMD SMCA-capable system:
+ #!/bin/bash
+ modprobe mce-inject
  
- [Fix]
+ EINJ=/sys/kernel/debug/mce-inject
+ 
+ # See /sys/kernel/debug/mce-inject/README
+ 
+ echo hw > $EINJ/flags
+ echo 0x9c2030000000011b > $EINJ/status
+ echo 0x040000035dd8bfc0 > $EINJ/addr
+ echo 0x0000c2030b404000 > $EINJ/synd
+ echo 0 > $EINJ/bank
+ 
+ # Wait for MCE to appear in dmesg
+ sudo ras-mc-ctl --errors
+ There should be a new MCE event in the output:
+ 1 2020-04-13 19:19:55 +0000 error: Deferred error, no action required., CPU 
2, bank Load Store Unit (bank=0), mcg mcgstatus=0, mci UECC, mcgcap=0x0000011c, 
status=0x9c2030000000011b, addr=0x35dd8bfc0, walltime=0x5e94bb5d, 
cpuid=0x00830f10
+ 
+ 
+ For Skylake, I regression tested by using mce-test w/ the "corrected" test, 
as I'm not sure how to inject a Skylake-specific event there.
+ git clone https://github.com/andikleen/mce-inject
+ cd mce-inject
+ make
+ sudo ./mce-inject < test/corrected
+ sudo ras-mc-ctl --errors
+ No Memory errors.
+ 
+ No PCIe AER errors.
+ 
+ No Extlog errors.
+ 
+ MCE events:
+ 1 2020-04-14 00:13:07 +0000 error: No Error, mcg mcgstatus=0, mci 
Corrected_error Error_enabled, mcgcap=0x0f000814, status=0x9400000000000000, 
addr=0x0000abcd, walltime=0x5e950014, cpuid=0x00050654, bank=0x00000001
+ 2 2020-04-14 00:13:07 +0000 error: No Error, mcg mcgstatus=0, mci 
Corrected_error Error_enabled, mcgcap=0x0f000814, status=0x9400000000000000, 
addr=0x00001234, walltime=0x5e950014, cpu=0x00000001, cpuid=0x00050654, 
apicid=0x00000002, bank=0x00000002
+ 
  
  [Regression Risk]
+ The new code added should only run on the newly supported systems, so 
regressions should be restricted to those systems. On those systems, a bug in 
the decoding code could cause an issue on these systems such as a crash in 
rasdaemon, etc. That is mitigated by testing on those newly supported platforms.

** Changed in: rasdaemon (Ubuntu Eoan)
       Status: New => In Progress

** Changed in: rasdaemon (Ubuntu Bionic)
       Status: New => In Progress

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1871965

Title:
  new platform support: Intel SkyLake, AMD Scalable MCA

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/rasdaemon/+bug/1871965/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to