On Tue, Jun 25, 2024 at 02:56:24PM -0500, Avadhut Naik wrote: > From: Yazen Ghannam <yazen.ghan...@amd.com> > > A new "FRU Text in MCA" feature is defined where the Field Replaceable > Unit (FRU) Text for a device is represented by a string in the new > MCA_SYND1 and MCA_SYND2 registers. This feature is supported per MCA > bank, and it is advertised by the McaFruTextInMca bit (MCA_CONFIG[9]). > > The FRU Text is populated dynamically for each individual error state > (MCA_STATUS, MCA_ADDR, et al.). This handles the case where an MCA bank > covers multiple devices, for example, a Unified Memory Controller (UMC) > bank that manages two DIMMs. >
>From here... > Print the FRU Text string, if available, when decoding an MCA error. > > Also, add field for MCA_CONFIG MSR in struct mce_hw_err as vendor specific > error information and save the value of the MSR. The very value can then be > exported through tracepoint for userspace tools like rasdaemon to print FRU > Text, if available. > > Note: Checkpatch checks/warnings are ignored to maintain coding style. ... to here goes into the trash can except what MCA_CONFIG is for being logged as part of the error. > [Yazen: Add Avadhut as co-developer for wrapper changes. ] > > Co-developed-by: Avadhut Naik <avadhut.n...@amd.com> > Signed-off-by: Avadhut Naik <avadhut.n...@amd.com> > Signed-off-by: Yazen Ghannam <yazen.ghan...@amd.com> Ditto as for patch 3. > --- > @@ -853,8 +850,18 @@ amd_decode_mce(struct notifier_block *nb, unsigned long > val, void *data) > > if (m->status & MCI_STATUS_SYNDV) { > pr_cont(", Syndrome: 0x%016llx\n", m->synd); > - pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: > 0x%016llx", > - err->vi.amd.synd1, err->vi.amd.synd2); > + if (mca_config & MCI_CONFIG_FRUTEXT) { > + char frutext[17]; > + > + memset(frutext, 0, sizeof(frutext)); Why are you clearing it if you're overwriting it immediately? > + memcpy(&frutext[0], &err->vi.amd.synd1, 8); > + memcpy(&frutext[8], &err->vi.amd.synd2, 8); > + > + pr_emerg(HW_ERR "FRU Text: %s", frutext); > + } else { > + pr_emerg(HW_ERR "Syndrome1: 0x%016llx, > Syndrome2: 0x%016llx", > + err->vi.amd.synd1, err->vi.amd.synd2); > + } > } > > pr_cont("\n"); > -- > 2.34.1 > -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette