On Sat, Sep 09, 2017 at 06:32:25PM +0200, Markus Trippelsdorf wrote: > Also tried the following patch. It does not help.
Ok, another theory. This one still needs to be fixed properly but that for later. For some reason (insufficient coffee maybe), I have mistyped your MCi_STATUS value earlier. Your mail says it is "fa000010000b0c0f". Do you still have a screen photo to verify it? Because if so, the correct error type is: MC4_STATUS[Val|Over|UC|EN|MiscV|PCC|EEC: Protocol error (link, L3, probe filter) (0x0b)|ET: BUS(pp:OBS;t:NOTIMOUT;r4:GEN;ii:GEN;ll:LG)]: 0xfa000010000b0c0f And for that I'd need the MC4_ADDR value too. So can you please apply the patch below ontop of the syncflood quirk patch and retrigger, make a photo of the MCE and send it to me? Thanks. --- commit e84e5ad290c7c26af69a721148f404766529509b Author: Borislav Petkov <b...@suse.de> Date: Sat Sep 9 00:55:50 2017 +0200 x86/MCE/AMD: Collect error info even if valid bits are not set The MCA banks log error info into MCA_ADDR, MCA_MISC0, and MCA_SYND even if the corresponding valid bits are not set: "Error handlers should save the values in MCA_ADDR, MCA_MISC0, and MCA_SYND even if MCA_STATUS[AddrV], MCA_STATUS[MiscV], and MCA_STATUS[SyndV] are zero." Do so by setting those bits so that code down the MCE processing path doesn't need to be changed. Signed-off-by: Borislav Petkov <b...@suse.de> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index 3b413065c613..c63c7ef326c7 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -436,6 +436,20 @@ static inline void mce_gather_info(struct mce *m, struct pt_regs *regs) if (mca_cfg.rip_msr) m->ip = mce_rdmsrl(mca_cfg.rip_msr); } + + /* + * Error handlers should save the values in MCA_ADDR, MCA_MISC0, and + * MCA_SYND even if MCA_STATUS[AddrV], MCA_STATUS[MiscV], and + * MCA_STATUS[SyndV] are zero. + */ + if (m->cpuvendor == X86_VENDOR_AMD) { + u64 status = MCI_STATUS_ADDRV | MCI_STATUS_MISCV; + + if (mce_flags.smca) + status |= MCI_STATUS_SYNDV; + + m->status |= status; + } } int mce_available(struct cpuinfo_x86 *c) -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.