Hello, We currently have several servers reporting faulty memory through MCE.
Example dmesg output: [Mi Aug 22 13:54:47 2018] mce: [Hardware Error]: Machine check events logged [Mi Aug 22 13:54:47 2018] mce: [Hardware Error]: CPU 12: Machine Check: 0 Bank 7: cc027c0000010091 [Mi Aug 22 13:54:47 2018] mce: [Hardware Error]: TSC 0 ADDR 70fc337d80 MISC 50202086 [Mi Aug 22 13:54:47 2018] mce: [Hardware Error]: PROCESSOR 0:306f2 TIME 1534938887 SOCKET 1 APIC 20 microcode 3d [Mi Aug 22 13:54:47 2018] mce: [Hardware Error]: Machine check events logged [Mi Aug 22 13:54:47 2018] mce: [Hardware Error]: CPU 14: Machine Check: 0 Bank 7: 8c00004000010091 [Mi Aug 22 13:54:47 2018] mce: [Hardware Error]: TSC 0 ADDR 70fb117d40 MISC 4268e886 [Mi Aug 22 13:54:47 2018] mce: [Hardware Error]: PROCESSOR 0:306f2 TIME 1534938887 SOCKET 1 APIC 24 microcode 3d [Mi Aug 22 13:54:47 2018] mce: [Hardware Error]: CPU 15: Machine Check: 0 Bank 7: cc00008000010091 [Mi Aug 22 13:54:47 2018] mce: [Hardware Error]: TSC 0 ADDR 70fb1b3ec0 MISC 142189886 [Mi Aug 22 13:54:47 2018] mce: [Hardware Error]: PROCESSOR 0:306f2 TIME 1534938887 SOCKET 1 APIC 26 microcode 3d Normally we verify those errors by checking IPMIs event log, but no errors are showing there. Neither IPMI nor ras-mc-ctl report any errors. We encountered this problem running Kernel 4.12.0 based on openSUSE SLE15 on commit a906b62b3f80679eac4f38373492a871c5f3568e. Is this an MCE Kernel bug? -- Mit freundlichen Grüßen Daniel Aberger Ihr Profihost Team ------------------------------- Profihost AG Expo Plaza 1 30539 Hannover Deutschland Tel.: +49 (511) 5151 8181 | Fax.: +49 (511) 5151 8282 URL: http://www.profihost.com | E-Mail: i...@profihost.com Sitz der Gesellschaft: Hannover, USt-IdNr. DE813460827 Registergericht: Amtsgericht Hannover, Register-Nr.: HRB 202350 Vorstand: Cristoph Bluhm, Sebastian Bluhm, Stefan Priebe Aufsichtsrat: Prof. Dr. iur. Winfried Huck (Vorsitzender)