On 10/30/2024 13:01, Borislav Petkov wrote:
> On Wed, Oct 30, 2024 at 05:50:02PM +0100, Borislav Petkov wrote:
>> Bah, crap. Lemme go back and take a second stab at this.
> 
> Second try.
> 
> The reason why I don't want to expose MCA_CONFIG to userspace is, well,
> userspace doesn't need to know any "management" information the hw gives. It
> either gets FRU text in that tracepoint or it doesn't. But it doesn't need to
> know what MCA_CONFIG said or didn't say.
> 
> Ok?
> 
So, for now, in the kernel, we log SYND1/2 registers only when they contain
FRUText.
While in the userspace, since MCA_CONFIG is not in the picture, we always
interpret SYND1/2 data as FRUText.
Rasdaemon might need to be tweaked accordingly. Will take care of it.
Overall, sounds good.

Do you want me send out a revised version with these changes?

> Author: Yazen Ghannam <yazen.ghan...@amd.com>
> Date:   Tue Oct 22 19:36:31 2024 +0000
> 
>     EDAC/mce_amd: Add support for FRU text in MCA
>     
>     A new "FRU Text in MCA" feature is defined where the Field Replaceable
>     Unit (FRU) Text for a device is represented by a string in the new
>     MCA_SYND1 and MCA_SYND2 registers. This feature is supported per MCA
>     bank, and it is advertised by the McaFruTextInMca bit (MCA_CONFIG[9]).
>     
>     The FRU Text is populated dynamically for each individual error state
>     (MCA_STATUS, MCA_ADDR, et al.). Handle the case where an MCA bank covers
>     multiple devices, for example, a Unified Memory Controller (UMC) bank
>     that manages two DIMMs.
>     
>       [ Yazen: Add Avadhut as co-developer for wrapper changes. ]
>       [ bp: Do not expose MCA_CONFIG to userspace yet. ]
>     
>     Signed-off-by: Yazen Ghannam <yazen.ghan...@amd.com>
>     Co-developed-by: Avadhut Naik <avadhut.n...@amd.com>
>     Signed-off-by: Avadhut Naik <avadhut.n...@amd.com>
>     Signed-off-by: Borislav Petkov (AMD) <b...@alien8.de>
>     Link: 
> https://lore.kernel.org/r/20241022194158.110073-6-avadhut.n...@amd.com
> 
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index 4d936ee20e24..4543cf2eb5e8 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -61,6 +61,7 @@
>   *  - TCC bit is present in MCx_STATUS.
>   */
>  #define MCI_CONFIG_MCAX              0x1
> +#define MCI_CONFIG_FRUTEXT   BIT_ULL(9)
>  #define MCI_IPID_MCATYPE     0xFFFF0000
>  #define MCI_IPID_HWID                0xFFF
>  
> diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
> index 194d9fd47d20..50d74d3bf0f5 100644
> --- a/drivers/edac/mce_amd.c
> +++ b/drivers/edac/mce_amd.c
> @@ -795,6 +795,7 @@ amd_decode_mce(struct notifier_block *nb, unsigned long 
> val, void *data)
>       struct mce *m = (struct mce *)data;
>       struct mce_hw_err *err = to_mce_hw_err(m);
>       unsigned int fam = x86_family(m->cpuid);
> +     u32 mca_config_lo = 0, dummy;
>       int ecc;
>  
>       if (m->kflags & MCE_HANDLED_CEC)
> @@ -814,11 +815,9 @@ amd_decode_mce(struct notifier_block *nb, unsigned long 
> val, void *data)
>               ((m->status & MCI_STATUS_PCC)   ? "PCC"   : "-"));
>  
>       if (boot_cpu_has(X86_FEATURE_SMCA)) {
> -             u32 low, high;
> -             u32 addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank);
> +             rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(m->bank), &mca_config_lo, 
> &dummy);
>  
> -             if (!rdmsr_safe(addr, &low, &high) &&
> -                 (low & MCI_CONFIG_MCAX))
> +             if (mca_config_lo & MCI_CONFIG_MCAX)
>                       pr_cont("|%s", ((m->status & MCI_STATUS_TCC) ? "TCC" : 
> "-"));
>  
>               pr_cont("|%s", ((m->status & MCI_STATUS_SYNDV) ? "SyndV" : 
> "-"));
> @@ -853,8 +852,15 @@ amd_decode_mce(struct notifier_block *nb, unsigned long 
> val, void *data)
>  
>               if (m->status & MCI_STATUS_SYNDV) {
>                       pr_cont(", Syndrome: 0x%016llx\n", m->synd);
> -                     pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 
> 0x%016llx",
> -                              err->vendor.amd.synd1, err->vendor.amd.synd2);
> +                     if (mca_config_lo & MCI_CONFIG_FRUTEXT) {
> +                             char frutext[17];
> +
> +                             frutext[16] = '\0';
> +                             memcpy(&frutext[0], &err->vendor.amd.synd1, 8);
> +                             memcpy(&frutext[8], &err->vendor.amd.synd2, 8);
> +
> +                             pr_emerg(HW_ERR "FRU Text: %s", frutext);
> +                     }
>               }
>  
>               pr_cont("\n");
> 

-- 
Thanks,
Avadhut Naik

Reply via email to