On Wed, Oct 30, 2024 at 02:57:33PM -0500, Naik, Avadhut wrote: > > > On 10/30/2024 13:01, Borislav Petkov wrote: > > On Wed, Oct 30, 2024 at 05:50:02PM +0100, Borislav Petkov wrote: > >> Bah, crap. Lemme go back and take a second stab at this. > > > > Second try. > > > > The reason why I don't want to expose MCA_CONFIG to userspace is, well, > > userspace doesn't need to know any "management" information the hw gives. It > > either gets FRU text in that tracepoint or it doesn't. But it doesn't need > > to > > know what MCA_CONFIG said or didn't say. > > > > Ok? > > > So, for now, in the kernel, we log SYND1/2 registers only when they contain > FRUText. > While in the userspace, since MCA_CONFIG is not in the picture, we always > interpret SYND1/2 data as FRUText. > Rasdaemon might need to be tweaked accordingly. Will take care of it. > Overall, sounds good. >
Sounds good to me too. Thanks, Yazen > Do you want me send out a revised version with these changes? > > > Author: Yazen Ghannam <yazen.ghan...@amd.com> > > Date: Tue Oct 22 19:36:31 2024 +0000 > > > > EDAC/mce_amd: Add support for FRU text in MCA > > > > A new "FRU Text in MCA" feature is defined where the Field Replaceable > > Unit (FRU) Text for a device is represented by a string in the new > > MCA_SYND1 and MCA_SYND2 registers. This feature is supported per MCA > > bank, and it is advertised by the McaFruTextInMca bit (MCA_CONFIG[9]). > > > > The FRU Text is populated dynamically for each individual error state > > (MCA_STATUS, MCA_ADDR, et al.). Handle the case where an MCA bank covers > > multiple devices, for example, a Unified Memory Controller (UMC) bank > > that manages two DIMMs. > > > > [ Yazen: Add Avadhut as co-developer for wrapper changes. ] > > [ bp: Do not expose MCA_CONFIG to userspace yet. ] > > > > Signed-off-by: Yazen Ghannam <yazen.ghan...@amd.com> > > Co-developed-by: Avadhut Naik <avadhut.n...@amd.com> > > Signed-off-by: Avadhut Naik <avadhut.n...@amd.com> > > Signed-off-by: Borislav Petkov (AMD) <b...@alien8.de> > > Link: > > https://lore.kernel.org/r/20241022194158.110073-6-avadhut.n...@amd.com > > > > diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h > > index 4d936ee20e24..4543cf2eb5e8 100644 > > --- a/arch/x86/include/asm/mce.h > > +++ b/arch/x86/include/asm/mce.h > > @@ -61,6 +61,7 @@ > > * - TCC bit is present in MCx_STATUS. > > */ > > #define MCI_CONFIG_MCAX 0x1 > > +#define MCI_CONFIG_FRUTEXT BIT_ULL(9) > > #define MCI_IPID_MCATYPE 0xFFFF0000 > > #define MCI_IPID_HWID 0xFFF > > > > diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c > > index 194d9fd47d20..50d74d3bf0f5 100644 > > --- a/drivers/edac/mce_amd.c > > +++ b/drivers/edac/mce_amd.c > > @@ -795,6 +795,7 @@ amd_decode_mce(struct notifier_block *nb, unsigned long > > val, void *data) > > struct mce *m = (struct mce *)data; > > struct mce_hw_err *err = to_mce_hw_err(m); > > unsigned int fam = x86_family(m->cpuid); > > + u32 mca_config_lo = 0, dummy; > > int ecc; > > > > if (m->kflags & MCE_HANDLED_CEC) > > @@ -814,11 +815,9 @@ amd_decode_mce(struct notifier_block *nb, unsigned > > long val, void *data) > > ((m->status & MCI_STATUS_PCC) ? "PCC" : "-")); > > > > if (boot_cpu_has(X86_FEATURE_SMCA)) { > > - u32 low, high; > > - u32 addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank); > > + rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(m->bank), &mca_config_lo, > > &dummy); > > > > - if (!rdmsr_safe(addr, &low, &high) && > > - (low & MCI_CONFIG_MCAX)) > > + if (mca_config_lo & MCI_CONFIG_MCAX) > > pr_cont("|%s", ((m->status & MCI_STATUS_TCC) ? "TCC" : > > "-")); > > > > pr_cont("|%s", ((m->status & MCI_STATUS_SYNDV) ? "SyndV" : > > "-")); > > @@ -853,8 +852,15 @@ amd_decode_mce(struct notifier_block *nb, unsigned > > long val, void *data) > > > > if (m->status & MCI_STATUS_SYNDV) { > > pr_cont(", Syndrome: 0x%016llx\n", m->synd); > > - pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: > > 0x%016llx", > > - err->vendor.amd.synd1, err->vendor.amd.synd2); > > + if (mca_config_lo & MCI_CONFIG_FRUTEXT) { > > + char frutext[17]; > > + > > + frutext[16] = '\0'; > > + memcpy(&frutext[0], &err->vendor.amd.synd1, 8); > > + memcpy(&frutext[8], &err->vendor.amd.synd2, 8); > > + > > + pr_emerg(HW_ERR "FRU Text: %s", frutext); > > + } > > } > > > > pr_cont("\n"); > > > > -- > Thanks, > Avadhut Naik