* Borislav Petkov <b...@alien8.de> wrote:

> From: Borislav Petkov <b...@suse.de>
> 
> Some F14h machines have an erratum which, "under a highly specific
> and detailed set of internal timing conditions" can lead to skipping
> instructions and rIP corruption. Add the fix for those machines when
> their BIOS doesn't apply it or there simply isn't BIOS update for them.
> 
> Signed-off-by: Borislav Petkov <b...@suse.de>
> Tested-by: <m...@protonmail.ch>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=197285
> Cc: Sherry Hurwitz <sherry.hurw...@amd.com>
> Cc: Yazen Ghannam <yazen.ghan...@amd.com>
> Cc: <sta...@vger.kernel.org>
> ---
>  arch/x86/kernel/amd_nb.c | 39 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 39 insertions(+)
> 
> diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
> index 458da8509b75..7ad1dfc8f40e 100644
> --- a/arch/x86/kernel/amd_nb.c
> +++ b/arch/x86/kernel/amd_nb.c
> @@ -27,6 +27,8 @@ static const struct pci_device_id amd_root_ids[] = {
>       {}
>  };
>  
> +#define PCI_DEVICE_ID_AMD_CNB17H_F4     0x1704
> +
>  const struct pci_device_id amd_nb_misc_ids[] = {
>       { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_K8_NB_MISC) },
>       { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_10H_NB_MISC) },
> @@ -37,6 +39,7 @@ const struct pci_device_id amd_nb_misc_ids[] = {
>       { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_16H_NB_F3) },
>       { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_16H_M30H_NB_F3) },
>       { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_DF_F3) },
> +     { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_CNB17H_F3) },
>       {}
>  };
>  EXPORT_SYMBOL_GPL(amd_nb_misc_ids);
> @@ -48,6 +51,7 @@ static const struct pci_device_id amd_nb_link_ids[] = {
>       { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_16H_NB_F4) },
>       { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_16H_M30H_NB_F4) },
>       { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_DF_F4) },
> +     { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_CNB17H_F4) },
>       {}
>  };
>  
> @@ -402,11 +406,46 @@ void amd_flush_garts(void)
>  }
>  EXPORT_SYMBOL_GPL(amd_flush_garts);
>  
> +static void __fix_erratum_688(void *info)
> +{
> +#define MSR_AMD64_IC_CFG 0xC0011021
> +
> +     msr_set_bit(MSR_AMD64_IC_CFG, 3);
> +     msr_set_bit(MSR_AMD64_IC_CFG, 14);
> +}
> +
> +/* Apply erratum 688 fix so machines without a BIOS fix work. */
> +static __init void fix_erratum_688(void)
> +{
> +     struct pci_dev *F4;
> +     u32 val;
> +
> +     if (boot_cpu_data.x86 != 0x14)
> +             return;
> +
> +     if (!amd_northbridges.num)
> +             return;
> +
> +     F4 = node_to_amd_nb(0)->link;
> +     if (!F4)
> +             return;
> +
> +     if (pci_read_config_dword(F4, 0x164, &val))
> +             return;
> +
> +     if (val & BIT(2))
> +             return;
> +
> +     on_each_cpu(__fix_erratum_688, NULL, 0);

Any objections to me adding a printk message that we applied a fix?

        pr_info("x86/cpu/AMD: CPU erratum 688 worked around\n");

or so?

That would also create some pressure for customers to prod manufacturers to 
prod 
BIOS makers to fix the erratum in a BIOS update or so.

Plus, in the unlikely event that the erratum was not applied due to some other 
erratum, or the erratum was mis-documented, we'd eventually discover that as 
well.

Thanks,

        Ingo

Reply via email to