[PATCH] x86/AMD: Fix Socket ID for LLC topology for AMD Fam17h systems
The Socket ID is ApicId[bits] on Fam17h systems. Change substraction to logical AND when extracting socket_id from c->apicid. Signed-off-by: Yazen Ghannam --- arch/x86/kernel/cpu/amd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index f5c69d8..479555f 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -365,7 +365,7 @@ static void amd_detect_cmp(struct cpuinfo_x86 *c) if (c->x86 != 0x17 || !cpuid_edx(0x8006)) return; - socket_id = (c->apicid >> bits) - 1; + socket_id = (c->apicid >> bits) & 1; core_complex_id = (c->apicid & ((1 << bits) - 1)) >> 3; per_cpu(cpu_llc_id, cpu) = (socket_id << 3) | core_complex_id; -- 1.9.1
[PATCH v2] x86/mce: Always save severity in machine_check_poll
From: Yazen Ghannam The severity gives a hint as to how to handle the error. The notifier blocks can then use the severity to decide on an action. It's not necessary for machine_check_poll() to filter errors for the notifier chain, since each block will check its own set of conditions before handling an error. Also, there isn't any urgency for machine_check_poll() to make decisions based on severity like in do_machine_check(). If we can assume that a severity is set then we can use them in more notifier blocks. For example, the CEC block can check for a "KEEP" severity rather than checking bits in the status. This isn't possible now since the severity is not set except for "DEFFRRED/UCNA" errors with a valid address. Save the severity since we have it, and let the notifier blocks decide if they want to do anything. Signed-off-by: Yazen Ghannam --- Link: https://lkml.kernel.org/r/1497286446-59533-1-git-send-email-yazen.ghan...@amd.com v1->v2: * Expand commit message. arch/x86/kernel/cpu/mcheck/mce.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index b58b77808ce4..6dde0497efc7 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -673,7 +673,6 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b) { bool error_seen = false; struct mce m; - int severity; int i; this_cpu_inc(mce_poll_count); @@ -710,11 +709,7 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b) mce_read_aux(&m, i); - severity = mce_severity(&m, mca_cfg.tolerant, NULL, false); - - if (severity == MCE_DEFERRED_SEVERITY && mce_is_memory_error(&m)) - if (m.status & MCI_STATUS_ADDRV) - m.severity = severity; + m.severity = mce_severity(&m, mca_cfg.tolerant, NULL, false); /* * Don't get the IP here because it's unlikely to -- 2.7.4
[PATCH v2] x86/MCE/AMD: Always give PANIC severity for UC errors IN_KERNEL context
From: Yazen Ghannam The AMD severity grading function was introduced in v4.1 and has remained logically unchanged with the exception of a separate SMCA severity grading function for SMCA systems. The current logic can possibly give MCE_AR_SEVERITY for uncorrectable errors in kernel context. The system may then get stuck in a loop as memory_failure() will try to handle the bad kernel memory and find it busy. Return MCE_PANIC_SEVERITY for all UC errors IN_KERNEL context on AMD systems. After: b2f9d678e28c ("x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries") was accepted in v4.6, this issue was masked because of the tail-end attempt at kernel mode recovery in the #MC handler. However, uncorrectable errors IN_KERNEL context should always be considered unrecoverable and cause a panic. Fixes: bf80bbd7dcf5 (x86/mce: Add an AMD severities-grading function) Signed-off-by: Yazen Ghannam [ This needs to be reworked to apply to v4.1 and v4.4 stable branches.] Cc: # 4.9.x --- Link: https://lkml.kernel.org/r/1505830031-9630-1-git-send-email-yazen.ghan...@amd.com v1->v2: * Update commit message. arch/x86/kernel/cpu/mcheck/mce-severity.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c index 2773c8547f69..f5518706baa6 100644 --- a/arch/x86/kernel/cpu/mcheck/mce-severity.c +++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c @@ -245,6 +245,9 @@ static int mce_severity_amd(struct mce *m, int tolerant, char **msg, bool is_exc if (m->status & MCI_STATUS_UC) { + if (ctx == IN_KERNEL) + return MCE_PANIC_SEVERITY; + /* * On older systems where overflow_recov flag is not present, we * should simply panic if an error overflow occurs. If @@ -255,10 +258,6 @@ static int mce_severity_amd(struct mce *m, int tolerant, char **msg, bool is_exc if (mce_flags.smca) return mce_severity_amd_smca(m, ctx); - /* software can try to contain */ - if (!(m->mcgstatus & MCG_STATUS_RIPV) && (ctx == IN_KERNEL)) - return MCE_PANIC_SEVERITY; - /* kill current process */ return MCE_AR_SEVERITY; } else { -- 2.7.4
[PATCH] x86/mce/AMD: Fix mce_severity_amd_smca() signature
From: Yazen Ghannam Change the err_ctx type to "enum context" to match the type passed in. Suggested-by: Borislav Petkov Signed-off-by: Yazen Ghannam --- arch/x86/kernel/cpu/mcheck/mce-severity.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c index f5518706baa6..267311a7fc60 100644 --- a/arch/x86/kernel/cpu/mcheck/mce-severity.c +++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c @@ -204,7 +204,7 @@ static int error_context(struct mce *m) return IN_KERNEL; } -static int mce_severity_amd_smca(struct mce *m, int err_ctx) +static int mce_severity_amd_smca(struct mce *m, enum context err_ctx) { u32 addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank); u32 low, high; -- 2.7.4
[PATCH 1/2] x86/MCE/AMD: Check for NULL banks in THR interrupt handler
From: Yazen Ghannam If threshold_init_device() fails then per_cpu(threshold_banks) will be deallocated. The thresholding interrupt handler will still be active, so it's possible to get a NULL pointer dereference if a THR interrupt happens and any of the structures are NULL. Exit the handler if per_cpu(threshold_banks) is NULL and skip NULL banks. MCA error information will still be in the registers. The information will be logged during polling or in another MCA exception or interrupt handler. Fixes: 17ef4af0ec0f ("x86/mce/AMD: Use saved threshold block info in interrupt handler") Cc: # 4.13.x Signed-off-by: Yazen Ghannam --- arch/x86/kernel/cpu/mcheck/mce_amd.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c index dd33c357548f..2dbf34250bbf 100644 --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c @@ -934,13 +934,21 @@ static void log_and_reset_block(struct threshold_block *block) static void amd_threshold_interrupt(void) { struct threshold_block *first_block = NULL, *block = NULL, *tmp = NULL; + struct threshold_bank *th_bank = NULL; unsigned int bank, cpu = smp_processor_id(); + if (!per_cpu(threshold_banks, cpu)) + return; + for (bank = 0; bank < mca_cfg.banks; ++bank) { if (!(per_cpu(bank_map, cpu) & (1 << bank))) continue; - first_block = per_cpu(threshold_banks, cpu)[bank]->blocks; + th_bank = per_cpu(threshold_banks, cpu)[bank]; + if (!th_bank) + continue; + + first_block = th_bank->blocks; if (!first_block) continue; -- 2.17.1
[PATCH 2/2] x86/MCE/AMD: Skip creating kobjects with NULL names
From: Yazen Ghannam During mce_threshold_create_device() data structures are allocated for each CPUs MCA banks and thresholding blocks. These data structures are used to save information related to AMD's MCA Error Thresholding feature. The structures are used in the thresholding interrupt handler, and they are exposed to the user through sysfs. The sysfs interface has user-friendly names for each bank. However, errors in mce_threshold_create_device() will cause all the data structures to be deallocated. This will break the thresholding interrupt handler since it depends on these structures. One possible error is creating a kobject with a NULL name. This will happen if a bank exists on a system that doesn't have a name, e.g. new bank types on future systems. Skip creating kobjects for banks without a name. This means that the sysfs interface for this bank will not exist. But this will keep all the data structures allocated, so the thresholding interrupt handler will work, even for the unnamed bank. Also, the sysfs interface will still be populated for all existing, known bank types. Cc: # 4.13.x Signed-off-by: Yazen Ghannam --- arch/x86/kernel/cpu/mcheck/mce_amd.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c index 2dbf34250bbf..521fd8f406df 100644 --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c @@ -1130,6 +1130,7 @@ static int allocate_threshold_blocks(unsigned int cpu, unsigned int bank, struct threshold_block *b = NULL; u32 low, high; int err; + const char *name = NULL; if ((bank >= mca_cfg.banks) || (block >= NR_BLOCKS)) return 0; @@ -1176,9 +1177,13 @@ static int allocate_threshold_blocks(unsigned int cpu, unsigned int bank, per_cpu(threshold_banks, cpu)[bank]->blocks = b; } + name = get_name(bank, b); + if (!name) + goto recurse; + err = kobject_init_and_add(&b->kobj, &threshold_ktype, per_cpu(threshold_banks, cpu)[bank]->kobj, - get_name(bank, b)); + name); if (err) goto out_free; recurse: @@ -1265,12 +1270,16 @@ static int threshold_create_bank(unsigned int cpu, unsigned int bank) goto out; } + if (!name) + goto allocate; + b->kobj = kobject_create_and_add(name, &dev->kobj); if (!b->kobj) { err = -EINVAL; goto out_free; } +allocate: per_cpu(threshold_banks, cpu)[bank] = b; if (is_shared_bank(bank)) { -- 2.17.1
[PATCH] x86/mce: Handle varying MCA bank counts
From: Yazen Ghannam Linux reads MCG_CAP[Count] to find the number of MCA banks visible to a CPU. Currently, this is assumed to be the same for all CPUs and a warning is shown if there is a difference. The number of banks is overwritten with the MCG_CAP[Count] value of each following CPU that boots. According to the Intel SDM and AMD APM, the MCG_CAP[Count] value gives the number of banks that are available to a "processor implementation". The AMD BKDGs/PPRs further clarify that this value is per core. This value has historically been the same for every core in the system, but that is not an architectural requirement. Future AMD systems may have different MCG_CAP[Count] values per core, so the assumption that all CPUs will have the same MCG_CAP[Count] value will no longer be valid. Also, the first CPU to boot will allocate the struct mce_banks[] array using the number of banks based on its MCG_CAP[Count] value. The machine check handler and other functions use the global number of banks to iterate and index into the mce_banks[] array. So it's possible to use an out-of-bounds index on an asymmetric system where a following CPU sees a MCG_CAP[Count] value greater than its predecessors. For example, CPU0 sees MCG_CAP[Count]=2. It sets mca_cfg.banks=2 and allocates mce_banks[] with 2 elements. CPU1 sees MCG_CAP[Count]=3 and sets mca_cfg.banks=3, but mce_banks[] is already allocated and remains having 2 elements. Allocate the mce_banks[] array to the maximum number of banks. This will avoid the potential out-of-bounds index since we cap the value of mca_cfg.banks to MAX_NR_BANKS. Set the value of mca_cfg.banks equal to the max of the previous value and the value for the current CPU. This way mca_cfg.banks will always represent the max number of banks detected on any CPU in the system. This will ensure that all CPUs will access all the banks that are visible to them. A CPU that can access fewer than the max number of banks will find the registers of the extra banks to be read-as-zero. Print the number of MCA banks that we're using. Do this in mcheck_late_init() so that we print the final value after all CPUs have been initialized. Get bank count from target CPU when doing injection with mce-inject module. Signed-off-by: Yazen Ghannam --- arch/x86/kernel/cpu/mcheck/mce-inject.c | 14 +++--- arch/x86/kernel/cpu/mcheck/mce.c| 21 +++-- 2 files changed, 14 insertions(+), 21 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce-inject.c b/arch/x86/kernel/cpu/mcheck/mce-inject.c index c805a06e14c3..5dda56d56dd3 100644 --- a/arch/x86/kernel/cpu/mcheck/mce-inject.c +++ b/arch/x86/kernel/cpu/mcheck/mce-inject.c @@ -46,8 +46,6 @@ static struct mce i_mce; static struct dentry *dfs_inj; -static u8 n_banks; - #define MAX_FLAG_OPT_SIZE 4 #define NBCFG 0x44 @@ -567,9 +565,15 @@ static void do_inject(void) static int inj_bank_set(void *data, u64 val) { struct mce *m = (struct mce *)data; + u64 cap; + u8 n_banks; + + /* Get bank count on target CPU so we can handle non-uniform values. */ + rdmsrl_on_cpu(m->extcpu, MSR_IA32_MCG_CAP, &cap); + n_banks = cap & MCG_BANKCNT_MASK; if (val >= n_banks) { - pr_err("Non-existent MCE bank: %llu\n", val); + pr_err("MCA bank %llu non-existent on CPU%d\n", val, m->extcpu); return -EINVAL; } @@ -659,10 +663,6 @@ static struct dfs_node { static int __init debugfs_init(void) { unsigned int i; - u64 cap; - - rdmsrl(MSR_IA32_MCG_CAP, cap); - n_banks = cap & MCG_BANKCNT_MASK; dfs_inj = debugfs_create_dir("mce-inject", NULL); if (!dfs_inj) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index 4b767284b7f5..4238c65a0cce 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -1479,13 +1479,12 @@ EXPORT_SYMBOL_GPL(mce_notify_irq); static int __mcheck_cpu_mce_banks_init(void) { int i; - u8 num_banks = mca_cfg.banks; - mce_banks = kcalloc(num_banks, sizeof(struct mce_bank), GFP_KERNEL); + mce_banks = kcalloc(MAX_NR_BANKS, sizeof(struct mce_bank), GFP_KERNEL); if (!mce_banks) return -ENOMEM; - for (i = 0; i < num_banks; i++) { + for (i = 0; i < MAX_NR_BANKS; i++) { struct mce_bank *b = &mce_banks[i]; b->ctl = -1ULL; @@ -1499,24 +1498,16 @@ static int __mcheck_cpu_mce_banks_init(void) */ static int __mcheck_cpu_cap_init(void) { - unsigned b; + u8 b; u64 cap; rdmsrl(MSR_IA32_MCG_CAP, cap); b = cap & MCG_BANKCNT_MASK; - if (!mca_cfg.banks) - pr_info("CPU supports %d MCE banks\n", b); - - if (b > MAX_NR_BANKS) { - pr_warn("Using only
[PATCH] PCI/ACPI: Disable AER when _OSC control bit is clear.
From: Yazen Ghannam Currently, aer_service_init() checks if AER is available and that Firmware First handling is not enabled. The _OSC request for AER is not taken into account when deciding to enable AER in Linux. We should check that the _OSC control for AER is set. If it's not then AER should be disabled. The _OSC control for AER is not requested when APEI Firmware First is used, so the same condition applies. Mark AER as disabled if the _OSC request was not made or accepted. Remove redunant check for aer_acpi_firmware_first() when calling aer_service_init(), since this is check is already included when checking the _OSC control. Signed-off-by: Yazen Ghannam --- drivers/acpi/pci_root.c | 3 +++ drivers/pci/pcie/aer/aerdrv.c | 2 +- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c index 6fc204a52493..19a625ed8de9 100644 --- a/drivers/acpi/pci_root.c +++ b/drivers/acpi/pci_root.c @@ -512,6 +512,9 @@ static void negotiate_os_control(struct acpi_pci_root *root, int *no_aspm) */ *no_aspm = 1; } + + if (!(requested & control & OSC_PCI_EXPRESS_AER_CONTROL)) + pci_no_aer(); } static int acpi_pci_root_add(struct acpi_device *device, diff --git a/drivers/pci/pcie/aer/aerdrv.c b/drivers/pci/pcie/aer/aerdrv.c index 6ff5f5b4f5e6..39bb059777d0 100644 --- a/drivers/pci/pcie/aer/aerdrv.c +++ b/drivers/pci/pcie/aer/aerdrv.c @@ -374,7 +374,7 @@ static void aer_error_resume(struct pci_dev *dev) */ static int __init aer_service_init(void) { - if (!pci_aer_available() || aer_acpi_firmware_first()) + if (!pci_aer_available()) return -ENXIO; return pcie_port_service_register(&aerdriver); } -- 2.14.1
[PATCH 3/3] x86/MCE/AMD: Get address from already initialized block
From: Yazen Ghannam The block address is saved after the block is initialized when threshold_init_device() is called. Use the saved block address, if available, rather than trying to rediscover it. We can avoid some *on_cpu() calls in the init path that will cause a call trace when resuming from suspend. Cc: # 4.14.x Signed-off-by: Yazen Ghannam --- arch/x86/kernel/cpu/mcheck/mce_amd.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c index bf53b4549a17..8c4f8f30c779 100644 --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c @@ -436,6 +436,21 @@ static u32 get_block_address(unsigned int cpu, u32 current_addr, u32 low, u32 hi { u32 addr = 0, offset = 0; + if ((bank >= mca_cfg.banks) || (block >= NR_BLOCKS)) + return addr; + + /* Get address from already initialized block. */ + if (per_cpu(threshold_banks, cpu)) { + struct threshold_bank *bankp = per_cpu(threshold_banks, cpu)[bank]; + + if (bankp && bankp->blocks) { + struct threshold_block *blockp = &bankp->blocks[block]; + + if (blockp) + return blockp->address; + } + } + if (mce_flags.smca) { if (smca_get_bank_type(bank) == SMCA_RESERVED) return addr; -- 2.14.1
[PATCH 1/3] x86/MCE/AMD: Redo function to get SMCA bank type
From: Yazen Ghannam Pass the bank number to smca_get_bank_type() since that's all we need. Also, we should compare the bank number to the size of the smca_banks array not the number of bank types. Bank types are reused for multiple banks, so the number of types can be different from the number of banks in a system. We could return an invalid bank type. Use smca_get_bank_type() in get_name(), and change type of bank_type variable to match return type of smca_get_bank_type(). Cc: # 4.14.x: 11cf887728a3 x86/MCE/AMD: Define a function to get SMCA bank type Cc: # 4.14.x: c6708d50f166 x86/MCE: Report only DRAM ECC as memory errors on AMD systems Cc: # 4.14.x Signed-off-by: Yazen Ghannam --- arch/x86/kernel/cpu/mcheck/mce_amd.c | 15 +++ 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c index 0f32ad242324..4e16afc0794d 100644 --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c @@ -110,14 +110,14 @@ const char *smca_get_long_name(enum smca_bank_types t) } EXPORT_SYMBOL_GPL(smca_get_long_name); -static enum smca_bank_types smca_get_bank_type(struct mce *m) +static enum smca_bank_types smca_get_bank_type(unsigned int bank) { struct smca_bank *b; - if (m->bank >= N_SMCA_BANK_TYPES) + if (bank >= ARRAY_SIZE(smca_banks)) return N_SMCA_BANK_TYPES; - b = &smca_banks[m->bank]; + b = &smca_banks[bank]; if (!b->hwid) return N_SMCA_BANK_TYPES; @@ -760,7 +760,7 @@ bool amd_mce_is_memory_error(struct mce *m) u8 xec = (m->status >> 16) & 0x1f; if (mce_flags.smca) - return smca_get_bank_type(m) == SMCA_UMC && xec == 0x0; + return smca_get_bank_type(m->bank) == SMCA_UMC && xec == 0x0; return m->bank == 4 && xec == 0x8; } @@ -1063,7 +1063,7 @@ static struct kobj_type threshold_ktype = { static const char *get_name(unsigned int bank, struct threshold_block *b) { - unsigned int bank_type; + enum smca_bank_types bank_type; if (!mce_flags.smca) { if (b && bank == 4) @@ -1072,11 +1072,10 @@ static const char *get_name(unsigned int bank, struct threshold_block *b) return th_names[bank]; } - if (!smca_banks[bank].hwid) + bank_type = smca_get_bank_type(bank); + if (bank_type >= N_SMCA_BANK_TYPES) return NULL; - bank_type = smca_banks[bank].hwid->bank_type; - if (b && bank_type == SMCA_UMC) { if (b->block < ARRAY_SIZE(smca_umc_block_names)) return smca_umc_block_names[b->block]; -- 2.14.1
[PATCH 2/3] x86/MCE/AMD, EDAC/mce_amd: Enumerate Reserved SMCA bank type
From: Yazen Ghannam Currently, bank 4 is reserved on Fam17h, so we chose not to initialize bank 4 in the smca_banks array. This means that when we check if a bank is initialized, like during boot or resume, we will see that bank 4 is not initialized and try to initialize it. This may cause a call trace, when resuming from suspend, due to *on_cpu() calls in the init path. Reserved banks will be read-as-zero, so their MCA_IPID register will be zero. So, like the smca_banks array, the threshold_banks array will not have an entry for a reserved bank since all its MCA_MISC* registers will be zero. Enumerate a "Reserved" bank type that matches on a HWID_MCATYPE of 0,0. Use the "Reserved" type when checking if a bank is reserved. It's possible that other bank numbers may be reserved on future systems. Don't try to find the block address on reserved banks. Cc: # 4.14.x Signed-off-by: Yazen Ghannam --- arch/x86/include/asm/mce.h | 1 + arch/x86/kernel/cpu/mcheck/mce_amd.c | 7 +++ drivers/edac/mce_amd.c | 11 +++ 3 files changed, 15 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h index 96ea4b5ba658..340070415c2c 100644 --- a/arch/x86/include/asm/mce.h +++ b/arch/x86/include/asm/mce.h @@ -346,6 +346,7 @@ enum smca_bank_types { SMCA_IF,/* Instruction Fetch */ SMCA_L2_CACHE, /* L2 Cache */ SMCA_DE,/* Decoder Unit */ + SMCA_RESERVED, /* Reserved */ SMCA_EX,/* Execution Unit */ SMCA_FP,/* Floating Point */ SMCA_L3_CACHE, /* L3 Cache */ diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c index 4e16afc0794d..bf53b4549a17 100644 --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c @@ -82,6 +82,7 @@ static struct smca_bank_name smca_names[] = { [SMCA_IF] = { "insn_fetch", "Instruction Fetch Unit" }, [SMCA_L2_CACHE] = { "l2_cache", "L2 Cache" }, [SMCA_DE] = { "decode_unit", "Decode Unit" }, + [SMCA_RESERVED] = { "reserved", "Reserved" }, [SMCA_EX] = { "execution_unit", "Execution Unit" }, [SMCA_FP] = { "floating_point", "Floating Point Unit" }, [SMCA_L3_CACHE] = { "l3_cache", "L3 Cache" }, @@ -127,6 +128,9 @@ static enum smca_bank_types smca_get_bank_type(unsigned int bank) static struct smca_hwid smca_hwid_mcatypes[] = { /* { bank_type, hwid_mcatype, xec_bitmap } */ + /* Reserved type */ + { SMCA_RESERVED, HWID_MCATYPE(0x00, 0x0), 0x0 }, + /* ZN Core (HWID=0xB0) MCA types */ { SMCA_LS, HWID_MCATYPE(0xB0, 0x0), 0x1FFFEF }, { SMCA_IF, HWID_MCATYPE(0xB0, 0x1), 0x3FFF }, @@ -433,6 +437,9 @@ static u32 get_block_address(unsigned int cpu, u32 current_addr, u32 low, u32 hi u32 addr = 0, offset = 0; if (mce_flags.smca) { + if (smca_get_bank_type(bank) == SMCA_RESERVED) + return addr; + if (!block) { addr = MSR_AMD64_SMCA_MCx_MISC(bank); } else { diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c index a11a671c7a38..2ab4d61ee47e 100644 --- a/drivers/edac/mce_amd.c +++ b/drivers/edac/mce_amd.c @@ -854,21 +854,24 @@ static void decode_mc6_mce(struct mce *m) static void decode_smca_error(struct mce *m) { struct smca_hwid *hwid; - unsigned int bank_type; + enum smca_bank_types bank_type; const char *ip_name; u8 xec = XEC(m->status, xec_mask); if (m->bank >= ARRAY_SIZE(smca_banks)) return; - if (x86_family(m->cpuid) >= 0x17 && m->bank == 4) - pr_emerg(HW_ERR "Bank 4 is reserved on Fam17h.\n"); - hwid = smca_banks[m->bank].hwid; if (!hwid) return; bank_type = hwid->bank_type; + + if (bank_type == SMCA_RESERVED) { + pr_emerg(HW_ERR "Bank %d is reserved.\n", m->bank); + return; + } + ip_name = smca_get_long_name(bank_type); pr_emerg(HW_ERR "%s Extended Error Code: %d\n", ip_name, xec); -- 2.14.1
[PATCH] ACPI / processor_idle: Set default C1 state description
From: Yazen Ghannam The acpi_idle driver will default to ACPI_CSTATE_HALT for C1 if a _CST object for C1 is not defined. However, the description will not be set, so users will see "" when reading the description from sysfs. Set the C1 state description when defaulting to ACPI_CSTATE_HALT. Signed-off-by: Yazen Ghannam Cc: sta...@vger.kernel.org --- drivers/acpi/processor_idle.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c index 5f0071c7e2e1..abb559cd28d7 100644 --- a/drivers/acpi/processor_idle.c +++ b/drivers/acpi/processor_idle.c @@ -292,6 +292,9 @@ static int acpi_processor_get_power_info_default(struct acpi_processor *pr) pr->power.states[ACPI_STATE_C1].type = ACPI_STATE_C1; pr->power.states[ACPI_STATE_C1].valid = 1; pr->power.states[ACPI_STATE_C1].entry_method = ACPI_CSTATE_HALT; + + snprintf(pr->power.states[ACPI_STATE_C1].desc, +ACPI_CX_DESC_LEN, "ACPI HLT"); } /* the C0 state only exists as a filler in our array */ pr->power.states[ACPI_STATE_C0].valid = 1; -- 2.14.1
[PATCH v2] PCI/ACPI: Disable AER when _OSC control bit is clear.
From: Yazen Ghannam Currently, aer_service_init() checks if AER is available and that Firmware First handling is not enabled. The _OSC request for AER is not taken into account when deciding to enable AER in Linux. >From ACPI 6.2 Section 6.2.11.3, "If any bits in the Control Field are returned cleared (masked to zero) by the _OSC control method, the respective feature is designated unsupported by the platform and must not be enabled by the OS." The OS and the Platform should agree that the OS can have control of AER otherwise we should disable AER in the OS. Mark AER as disabled if the _OSC request was not made or accepted. This covers two cases where the OS and Platform disagree: 1) The OS requests AER control and Platform denies the request. 2) The OS does not request AER control but the Platform returns the AER control bit set, possibly due to a Firmware bug. The _OSC control for AER is not requested when APEI Firmware First is used, so the same condition applies from case 2 above. Remove redundant check for aer_acpi_firmware_first() when calling aer_service_init(), since this check is already included when checking the _OSC control. Signed-off-by: Yazen Ghannam --- Link: https://lkml.kernel.org/r/2018050316.19951-1-yazen.ghan...@amd.com v1->v2: * Expand commit message. * Add Spec reference to commit message. * Fix spelling error in commit message. * Add comment for 3-way bitwise AND. drivers/acpi/pci_root.c | 7 +++ drivers/pci/pcie/aer/aerdrv.c | 2 +- 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c index 6fc204a52493..ab0192fd24c7 100644 --- a/drivers/acpi/pci_root.c +++ b/drivers/acpi/pci_root.c @@ -512,6 +512,13 @@ static void negotiate_os_control(struct acpi_pci_root *root, int *no_aspm) */ *no_aspm = 1; } + + /* +* We can use a 3-way bitwise AND to check that the AER control bit is +* both requested by the OS and granted by the Platform. +*/ + if (!(requested & control & OSC_PCI_EXPRESS_AER_CONTROL)) + pci_no_aer(); } static int acpi_pci_root_add(struct acpi_device *device, diff --git a/drivers/pci/pcie/aer/aerdrv.c b/drivers/pci/pcie/aer/aerdrv.c index 6ff5f5b4f5e6..39bb059777d0 100644 --- a/drivers/pci/pcie/aer/aerdrv.c +++ b/drivers/pci/pcie/aer/aerdrv.c @@ -374,7 +374,7 @@ static void aer_error_resume(struct pci_dev *dev) */ static int __init aer_service_init(void) { - if (!pci_aer_available() || aer_acpi_firmware_first()) + if (!pci_aer_available()) return -ENXIO; return pcie_port_service_register(&aerdriver); } -- 2.14.1
[PATCH v3 1/8] efi: Fix IA32/X64 Processor Error Record definition
From: Yazen Ghannam Based on UEFI 2.7 Table 252. Processor Error Record, the "Local APIC_ID" field is 8 bytes but Linux defines this field as 1 byte. Fix this in the struct cper_sec_proc_ia definition. Signed-off-by: Yazen Ghannam --- Link: https://lkml.kernel.org/r/20180226193904.20532-2-yazen.ghan...@amd.com v2->v3: * Fix table number in commit message. v1->v2: * No changes. include/linux/cper.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/cper.h b/include/linux/cper.h index d14ef4e77c8a..4b5f8459b403 100644 --- a/include/linux/cper.h +++ b/include/linux/cper.h @@ -381,7 +381,7 @@ struct cper_sec_proc_generic { /* IA32/X64 Processor Error Section */ struct cper_sec_proc_ia { __u64 validation_bits; - __u8lapic_id; + __u64 lapic_id; __u8cpuid[48]; }; -- 2.14.1
[PATCH v3 8/8] efi: Decode IA32/X64 Context Info structure
From: Yazen Ghannam Print the fields of the IA32/X64 Context Information structure. Print the "Register Array" as raw values. Some context types are defined in the UEFI spec, so more detailed decoded may be added in the future. Based on UEFI 2.7 section N.2.4.2.2 IA32/X64 Processor Context Information Structure. Signed-off-by: Yazen Ghannam --- Link: https://lkml.kernel.org/r/20180226193904.20532-9-yazen.ghan...@amd.com v2->v3: * No change. v1->v2: * Add parantheses around "bits" expression in macro. * Change VALID_PROC_CNTXT_INFO_NUM to VALID_PROC_CTX_INFO_NUM. * Fix indentation on multi-line statements. * Remove conditional to skip unknown context types. The context info should be printed even if the type is unknown. This is just like what we do for the error information. drivers/firmware/efi/cper-x86.c | 48 + 1 file changed, 48 insertions(+) diff --git a/drivers/firmware/efi/cper-x86.c b/drivers/firmware/efi/cper-x86.c index ffc01483cac2..fe479cc5a358 100644 --- a/drivers/firmware/efi/cper-x86.c +++ b/drivers/firmware/efi/cper-x86.c @@ -12,6 +12,7 @@ #define VALID_LAPIC_ID BIT_ULL(0) #define VALID_CPUID_INFO BIT_ULL(1) #define VALID_PROC_ERR_INFO_NUM(bits) (((bits) & GENMASK_ULL(7, 2)) >> 2) +#define VALID_PROC_CXT_INFO_NUM(bits) (((bits) & GENMASK_ULL(13, 8)) >> 8) #define INFO_ERR_STRUCT_TYPE_CACHE \ GUID_INIT(0xA55701F5, 0xE3EF, 0x43DE, 0xAC, 0x72, 0x24, 0x9B, \ @@ -73,6 +74,9 @@ #define CHECK_MS_RESTARTABLE_IPBIT_ULL(22) #define CHECK_MS_OVERFLOW BIT_ULL(23) +#define CTX_TYPE_MSR 1 +#define CTX_TYPE_MMREG 7 + enum err_types { ERR_TYPE_CACHE = 0, ERR_TYPE_TLB, @@ -136,6 +140,17 @@ static const char * const ia_check_ms_error_type_strs[] = { "Internal Unclassified", }; +static const char * const ia_reg_ctx_strs[] = { + "Unclassified Data", + "MSR Registers (Machine Check and other MSRs)", + "32-bit Mode Execution Context", + "64-bit Mode Execution Context", + "FXSAVE Context", + "32-bit Mode Debug Registers (DR0-DR7)", + "64-bit Mode Debug Registers (DR0-DR7)", + "Memory Mapped Registers", +}; + static inline void print_bool(char *str, const char *pfx, u64 check, u64 bit) { printk("%s%s: %s\n", pfx, str, (check & bit) ? "true" : "false"); @@ -244,6 +259,7 @@ void cper_print_proc_ia(const char *pfx, const struct cper_sec_proc_ia *proc) { int i; struct cper_ia_err_info *err_info; + struct cper_ia_proc_ctx *ctx_info; char newpfx[64], infopfx[64]; u8 err_type; @@ -307,4 +323,36 @@ void cper_print_proc_ia(const char *pfx, const struct cper_sec_proc_ia *proc) err_info++; } + + ctx_info = (struct cper_ia_proc_ctx *)err_info; + for (i = 0; i < VALID_PROC_CXT_INFO_NUM(proc->validation_bits); i++) { + int size = sizeof(*ctx_info) + ctx_info->reg_arr_size; + int groupsize = 4; + + printk("%sContext Information Structure %d:\n", pfx, i); + + printk("%sRegister Context Type: %s\n", newpfx, + ctx_info->reg_ctx_type < ARRAY_SIZE(ia_reg_ctx_strs) ? + ia_reg_ctx_strs[ctx_info->reg_ctx_type] : "unknown"); + + printk("%sRegister Array Size: 0x%04x\n", newpfx, + ctx_info->reg_arr_size); + + if (ctx_info->reg_ctx_type == CTX_TYPE_MSR) { + groupsize = 8; /* MSRs are 8 bytes wide. */ + printk("%sMSR Address: 0x%08x\n", newpfx, + ctx_info->msr_addr); + } + + if (ctx_info->reg_ctx_type == CTX_TYPE_MMREG) { + printk("%sMM Register Address: 0x%016llx\n", newpfx, + ctx_info->mm_reg_addr); + } + + printk("%sRegister Array:\n", newpfx); + print_hex_dump(newpfx, "", DUMP_PREFIX_OFFSET, 16, groupsize, + (ctx_info + 1), ctx_info->reg_arr_size, 0); + + ctx_info = (struct cper_ia_proc_ctx *)((long)ctx_info + size); + } } -- 2.14.1
[PATCH v3 6/8] efi: Decode additional IA32/X64 Bus Check fields
From: Yazen Ghannam The "Participation Type", "Time Out", and "Address Space" fields are unique to the IA32/X64 Bus Check structure. Print these fields. Based on UEFI 2.7 Table 256. IA32/X64 Bus Check Structure Signed-off-by: Yazen Ghannam --- Link: https://lkml.kernel.org/r/20180226193904.20532-7-yazen.ghan...@amd.com v2->v3: * Fix table number in commit message. v1->v2: * Add parantheses around "check" expression in macro. * Fix indentation on multi-line statements. drivers/firmware/efi/cper-x86.c | 44 + 1 file changed, 44 insertions(+) diff --git a/drivers/firmware/efi/cper-x86.c b/drivers/firmware/efi/cper-x86.c index 260e4bf7773c..54bd32d241b7 100644 --- a/drivers/firmware/efi/cper-x86.c +++ b/drivers/firmware/efi/cper-x86.c @@ -41,6 +41,10 @@ #define CHECK_VALID_RESTARTABLE_IP BIT_ULL(6) #define CHECK_VALID_OVERFLOW BIT_ULL(7) +#define CHECK_VALID_BUS_PART_TYPE BIT_ULL(8) +#define CHECK_VALID_BUS_TIME_OUT BIT_ULL(9) +#define CHECK_VALID_BUS_ADDR_SPACE BIT_ULL(10) + #define CHECK_VALID_BITS(check)(((check) & GENMASK_ULL(15, 0))) #define CHECK_TRANS_TYPE(check)(((check) & GENMASK_ULL(17, 16)) >> 16) #define CHECK_OPERATION(check) (((check) & GENMASK_ULL(21, 18)) >> 18) @@ -51,6 +55,10 @@ #define CHECK_RESTARTABLE_IP BIT_ULL(28) #define CHECK_OVERFLOW BIT_ULL(29) +#define CHECK_BUS_PART_TYPE(check) (((check) & GENMASK_ULL(31, 30)) >> 30) +#define CHECK_BUS_TIME_OUT BIT_ULL(32) +#define CHECK_BUS_ADDR_SPACE(check)(((check) & GENMASK_ULL(34, 33)) >> 33) + enum err_types { ERR_TYPE_CACHE = 0, ERR_TYPE_TLB, @@ -91,6 +99,20 @@ static const char * const ia_check_op_strs[] = { "snoop", }; +static const char * const ia_check_bus_part_type_strs[] = { + "Local Processor originated request", + "Local Processor responded to request", + "Local Processor observed", + "Generic", +}; + +static const char * const ia_check_bus_addr_space_strs[] = { + "Memory Access", + "Reserved", + "I/O", + "Other Transaction", +}; + static inline void print_bool(char *str, const char *pfx, u64 check, u64 bit) { printk("%s%s: %s\n", pfx, str, (check & bit) ? "true" : "false"); @@ -141,6 +163,28 @@ static void print_err_info(const char *pfx, u8 err_type, u64 check) if (validation_bits & CHECK_VALID_OVERFLOW) print_bool("Overflow", pfx, check, CHECK_OVERFLOW); + + if (err_type != ERR_TYPE_BUS) + return; + + if (validation_bits & CHECK_VALID_BUS_PART_TYPE) { + u8 part_type = CHECK_BUS_PART_TYPE(check); + + printk("%sParticipation Type: %u, %s\n", pfx, part_type, + part_type < ARRAY_SIZE(ia_check_bus_part_type_strs) ? + ia_check_bus_part_type_strs[part_type] : "unknown"); + } + + if (validation_bits & CHECK_VALID_BUS_TIME_OUT) + print_bool("Time Out", pfx, check, CHECK_BUS_TIME_OUT); + + if (validation_bits & CHECK_VALID_BUS_ADDR_SPACE) { + u8 addr_space = CHECK_BUS_ADDR_SPACE(check); + + printk("%sAddress Space: %u, %s\n", pfx, addr_space, + addr_space < ARRAY_SIZE(ia_check_bus_addr_space_strs) ? + ia_check_bus_addr_space_strs[addr_space] : "unknown"); + } } void cper_print_proc_ia(const char *pfx, const struct cper_sec_proc_ia *proc) -- 2.14.1
[PATCH v3 7/8] efi: Decode IA32/X64 MS Check structure
From: Yazen Ghannam The IA32/X64 MS Check structure varies from the other Check structures in the the bit positions of its fields, and it includes an additional "Error Type" field. Decode the MS Check structure in a separate function. Based on UEFI 2.7 Table 257. IA32/X64 MS Check Field Description. Signed-off-by: Yazen Ghannam --- Link: https://lkml.kernel.org/r/20180226193904.20532-8-yazen.ghan...@amd.com v2->v3: * Fix table number in commit message. v1->v2: * Add parantheses around "check" expression in macro. * Fix indentation on multi-line statements. drivers/firmware/efi/cper-x86.c | 55 - 1 file changed, 54 insertions(+), 1 deletion(-) diff --git a/drivers/firmware/efi/cper-x86.c b/drivers/firmware/efi/cper-x86.c index 54bd32d241b7..ffc01483cac2 100644 --- a/drivers/firmware/efi/cper-x86.c +++ b/drivers/firmware/efi/cper-x86.c @@ -59,6 +59,20 @@ #define CHECK_BUS_TIME_OUT BIT_ULL(32) #define CHECK_BUS_ADDR_SPACE(check)(((check) & GENMASK_ULL(34, 33)) >> 33) +#define CHECK_VALID_MS_ERR_TYPEBIT_ULL(0) +#define CHECK_VALID_MS_PCC BIT_ULL(1) +#define CHECK_VALID_MS_UNCORRECTED BIT_ULL(2) +#define CHECK_VALID_MS_PRECISE_IP BIT_ULL(3) +#define CHECK_VALID_MS_RESTARTABLE_IP BIT_ULL(4) +#define CHECK_VALID_MS_OVERFLOWBIT_ULL(5) + +#define CHECK_MS_ERR_TYPE(check) (((check) & GENMASK_ULL(18, 16)) >> 16) +#define CHECK_MS_PCC BIT_ULL(19) +#define CHECK_MS_UNCORRECTED BIT_ULL(20) +#define CHECK_MS_PRECISE_IPBIT_ULL(21) +#define CHECK_MS_RESTARTABLE_IPBIT_ULL(22) +#define CHECK_MS_OVERFLOW BIT_ULL(23) + enum err_types { ERR_TYPE_CACHE = 0, ERR_TYPE_TLB, @@ -113,17 +127,56 @@ static const char * const ia_check_bus_addr_space_strs[] = { "Other Transaction", }; +static const char * const ia_check_ms_error_type_strs[] = { + "No Error", + "Unclassified", + "Microcode ROM Parity Error", + "External Error", + "FRC Error", + "Internal Unclassified", +}; + static inline void print_bool(char *str, const char *pfx, u64 check, u64 bit) { printk("%s%s: %s\n", pfx, str, (check & bit) ? "true" : "false"); } +static void print_err_info_ms(const char *pfx, u16 validation_bits, u64 check) +{ + if (validation_bits & CHECK_VALID_MS_ERR_TYPE) { + u8 err_type = CHECK_MS_ERR_TYPE(check); + + printk("%sError Type: %u, %s\n", pfx, err_type, + err_type < ARRAY_SIZE(ia_check_ms_error_type_strs) ? + ia_check_ms_error_type_strs[err_type] : "unknown"); + } + + if (validation_bits & CHECK_VALID_MS_PCC) + print_bool("Processor Context Corrupt", pfx, check, CHECK_MS_PCC); + + if (validation_bits & CHECK_VALID_MS_UNCORRECTED) + print_bool("Uncorrected", pfx, check, CHECK_MS_UNCORRECTED); + + if (validation_bits & CHECK_VALID_MS_PRECISE_IP) + print_bool("Precise IP", pfx, check, CHECK_MS_PRECISE_IP); + + if (validation_bits & CHECK_VALID_MS_RESTARTABLE_IP) + print_bool("Restartable IP", pfx, check, CHECK_MS_RESTARTABLE_IP); + + if (validation_bits & CHECK_VALID_MS_OVERFLOW) + print_bool("Overflow", pfx, check, CHECK_MS_OVERFLOW); +} + static void print_err_info(const char *pfx, u8 err_type, u64 check) { u16 validation_bits = CHECK_VALID_BITS(check); + /* +* The MS Check structure varies a lot from the others, so use a +* separate function for decoding. +*/ if (err_type == ERR_TYPE_MS) - return; + return print_err_info_ms(pfx, validation_bits, check); if (validation_bits & CHECK_VALID_TRANS_TYPE) { u8 trans_type = CHECK_TRANS_TYPE(check); -- 2.14.1
[PATCH v3 0/8] Decode IA32/X64 CPER
From: Yazen Ghannam This series adds decoding for the IA32/X64 Common Platform Error Record. Patch 1 fixes the IA32/X64 Processor Error Section definition to match the UEFI spec. Patches 2-8 add the new decoding. The patches incrementally add the decoding starting from the top-level "Error Section". Hopefully, this will make reviewing a bit easier compared to one large patch. The formatting of the field names and options is taken from the UEFI spec. I tried to keep everything the same to make searching easier. The patches were written to the UEFI 2.7 spec though the definition of the IA32/X64 CPER seems to be the same as when it was introduced in the UEFI 2.1 spec. Link: https://lkml.kernel.org/r/20180226193904.20532-1-yazen.ghan...@amd.com Changes V2 to V3: * Fix table numbers in commit messages. * Don't print raw validation bits. * Only print GUID for unknown error types. Changes V1 to V2: * Remove stable request for all patches. * Address Ard's comments on formatting and other issues. * In Patch 8, always print context info even if the type is not recognized. Yazen Ghannam (8): efi: Fix IA32/X64 Processor Error Record definition efi: Decode IA32/X64 Processor Error Section efi: Decode IA32/X64 Processor Error Info Structure efi: Decode UEFI-defined IA32/X64 Error Structure GUIDs efi: Decode IA32/X64 Cache, TLB, and Bus Check structures efi: Decode additional IA32/X64 Bus Check fields efi: Decode IA32/X64 MS Check structure efi: Decode IA32/X64 Context Info structure drivers/firmware/efi/Kconfig| 5 + drivers/firmware/efi/Makefile | 1 + drivers/firmware/efi/cper-x86.c | 358 drivers/firmware/efi/cper.c | 10 ++ include/linux/cper.h| 4 +- 5 files changed, 377 insertions(+), 1 deletion(-) create mode 100644 drivers/firmware/efi/cper-x86.c -- 2.14.1
[PATCH v3 5/8] efi: Decode IA32/X64 Cache, TLB, and Bus Check structures
From: Yazen Ghannam Print the common fields of the Cache, TLB, and Bus check structures.The fields of these three check types are the same except for a few more fields in the Bus check structure. The remaining Bus check structure fields will be decoded in a following patch. Based on UEFI 2.7, Table 254. IA32/X64 Cache Check Structure Table 255. IA32/X64 TLB Check Structure Table 256. IA32/X64 Bus Check Structure Signed-off-by: Yazen Ghannam --- Link: https://lkml.kernel.org/r/20180226193904.20532-6-yazen.ghan...@amd.com v2->v3: * Fix table numbers in commit message. * Don't print raw validation bits. v1->v2: * Add parantheses around "check" expression in macro. * Change use of enum type to u8. * Fix indentation on multi-line statements. drivers/firmware/efi/cper-x86.c | 99 - 1 file changed, 98 insertions(+), 1 deletion(-) diff --git a/drivers/firmware/efi/cper-x86.c b/drivers/firmware/efi/cper-x86.c index 57fbea5b3a8c..260e4bf7773c 100644 --- a/drivers/firmware/efi/cper-x86.c +++ b/drivers/firmware/efi/cper-x86.c @@ -32,6 +32,25 @@ #define INFO_VALID_RESPONDER_IDBIT_ULL(3) #define INFO_VALID_IP BIT_ULL(4) +#define CHECK_VALID_TRANS_TYPE BIT_ULL(0) +#define CHECK_VALID_OPERATION BIT_ULL(1) +#define CHECK_VALID_LEVEL BIT_ULL(2) +#define CHECK_VALID_PCCBIT_ULL(3) +#define CHECK_VALID_UNCORRECTEDBIT_ULL(4) +#define CHECK_VALID_PRECISE_IP BIT_ULL(5) +#define CHECK_VALID_RESTARTABLE_IP BIT_ULL(6) +#define CHECK_VALID_OVERFLOW BIT_ULL(7) + +#define CHECK_VALID_BITS(check)(((check) & GENMASK_ULL(15, 0))) +#define CHECK_TRANS_TYPE(check)(((check) & GENMASK_ULL(17, 16)) >> 16) +#define CHECK_OPERATION(check) (((check) & GENMASK_ULL(21, 18)) >> 18) +#define CHECK_LEVEL(check) (((check) & GENMASK_ULL(24, 22)) >> 22) +#define CHECK_PCC BIT_ULL(25) +#define CHECK_UNCORRECTED BIT_ULL(26) +#define CHECK_PRECISE_IP BIT_ULL(27) +#define CHECK_RESTARTABLE_IP BIT_ULL(28) +#define CHECK_OVERFLOW BIT_ULL(29) + enum err_types { ERR_TYPE_CACHE = 0, ERR_TYPE_TLB, @@ -54,11 +73,81 @@ static enum err_types cper_get_err_type(const guid_t *err_type) return N_ERR_TYPES; } +static const char * const ia_check_trans_type_strs[] = { + "Instruction", + "Data Access", + "Generic", +}; + +static const char * const ia_check_op_strs[] = { + "generic error", + "generic read", + "generic write", + "data read", + "data write", + "instruction fetch", + "prefetch", + "eviction", + "snoop", +}; + +static inline void print_bool(char *str, const char *pfx, u64 check, u64 bit) +{ + printk("%s%s: %s\n", pfx, str, (check & bit) ? "true" : "false"); +} + +static void print_err_info(const char *pfx, u8 err_type, u64 check) +{ + u16 validation_bits = CHECK_VALID_BITS(check); + + if (err_type == ERR_TYPE_MS) + return; + + if (validation_bits & CHECK_VALID_TRANS_TYPE) { + u8 trans_type = CHECK_TRANS_TYPE(check); + + printk("%sTransaction Type: %u, %s\n", pfx, trans_type, + trans_type < ARRAY_SIZE(ia_check_trans_type_strs) ? + ia_check_trans_type_strs[trans_type] : "unknown"); + } + + if (validation_bits & CHECK_VALID_OPERATION) { + u8 op = CHECK_OPERATION(check); + + /* +* CACHE has more operation types than TLB or BUS, though the +* name and the order are the same. +*/ + u8 max_ops = (err_type == ERR_TYPE_CACHE) ? 9 : 7; + + printk("%sOperation: %u, %s\n", pfx, op, + op < max_ops ? ia_check_op_strs[op] : "unknown"); + } + + if (validation_bits & CHECK_VALID_LEVEL) + printk("%sLevel: %llu\n", pfx, CHECK_LEVEL(check)); + + if (validation_bits & CHECK_VALID_PCC) + print_bool("Processor Context Corrupt", pfx, check, CHECK_PCC); + + if (validation_bits & CHECK_VALID_UNCORRECTED) + print_bool("Uncorrected", pfx, check, CHECK_UNCORRECTED); + + if (validation_bits & CHECK_VALID_PRECISE_IP) + print_bool("Precise IP", pfx, check, CHECK_PRECISE_IP); + + if (validation_bits & CHECK_VALID_RESTARTABLE_IP) + print_bool("Restartable IP", pfx, check, CHECK_RESTARTABLE_IP); + + if (va
[PATCH v3 3/8] efi: Decode IA32/X64 Processor Error Info Structure
From: Yazen Ghannam Print the fields in the IA32/X64 Processor Error Info Structure. Based on UEFI 2.7 Table 253. IA32/X64 Processor Error Information Structure. Signed-off-by: Yazen Ghannam --- Link: https://lkml.kernel.org/r/20180226193904.20532-4-yazen.ghan...@amd.com v2->v3: * Fix table number in commit message. * Don't print raw validation bits. v1->v2: * Add parantheses around "bits" expression in macro. * Fix indentation on multi-line statements. drivers/firmware/efi/cper-x86.c | 50 + 1 file changed, 50 insertions(+) diff --git a/drivers/firmware/efi/cper-x86.c b/drivers/firmware/efi/cper-x86.c index 863f0cd2a0ff..a9ab3bbf7986 100644 --- a/drivers/firmware/efi/cper-x86.c +++ b/drivers/firmware/efi/cper-x86.c @@ -3,15 +3,28 @@ #include +#define INDENT_SP " " + /* * We don't need a "CPER_IA" prefix since these are all locally defined. * This will save us a lot of line space. */ #define VALID_LAPIC_ID BIT_ULL(0) #define VALID_CPUID_INFO BIT_ULL(1) +#define VALID_PROC_ERR_INFO_NUM(bits) (((bits) & GENMASK_ULL(7, 2)) >> 2) + +#define INFO_VALID_CHECK_INFO BIT_ULL(0) +#define INFO_VALID_TARGET_ID BIT_ULL(1) +#define INFO_VALID_REQUESTOR_IDBIT_ULL(2) +#define INFO_VALID_RESPONDER_IDBIT_ULL(3) +#define INFO_VALID_IP BIT_ULL(4) void cper_print_proc_ia(const char *pfx, const struct cper_sec_proc_ia *proc) { + int i; + struct cper_ia_err_info *err_info; + char newpfx[64]; + if (proc->validation_bits & VALID_LAPIC_ID) printk("%sLocal APIC_ID: 0x%llx\n", pfx, proc->lapic_id); @@ -20,4 +33,41 @@ void cper_print_proc_ia(const char *pfx, const struct cper_sec_proc_ia *proc) print_hex_dump(pfx, "", DUMP_PREFIX_OFFSET, 16, 4, proc->cpuid, sizeof(proc->cpuid), 0); } + + snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP); + + err_info = (struct cper_ia_err_info *)(proc + 1); + for (i = 0; i < VALID_PROC_ERR_INFO_NUM(proc->validation_bits); i++) { + printk("%sError Information Structure %d:\n", pfx, i); + + printk("%sError Structure Type: %pUl\n", newpfx, + &err_info->err_type); + + if (err_info->validation_bits & INFO_VALID_CHECK_INFO) { + printk("%sCheck Information: 0x%016llx\n", newpfx, + err_info->check_info); + } + + if (err_info->validation_bits & INFO_VALID_TARGET_ID) { + printk("%sTarget Identifier: 0x%016llx\n", + newpfx, err_info->target_id); + } + + if (err_info->validation_bits & INFO_VALID_REQUESTOR_ID) { + printk("%sRequestor Identifier: 0x%016llx\n", + newpfx, err_info->requestor_id); + } + + if (err_info->validation_bits & INFO_VALID_RESPONDER_ID) { + printk("%sResponder Identifier: 0x%016llx\n", + newpfx, err_info->responder_id); + } + + if (err_info->validation_bits & INFO_VALID_IP) { + printk("%sInstruction Pointer: 0x%016llx\n", + newpfx, err_info->ip); + } + + err_info++; + } } -- 2.14.1
[PATCH v3 4/8] efi: Decode UEFI-defined IA32/X64 Error Structure GUIDs
From: Yazen Ghannam For easier handling, match the known IA32/X64 error structure GUIDs to enums. Also, print out the name of the matching Error Structure Type. Only print the GUID for unknown types. GUIDs taken from UEFI 2.7 section N.2.4.2.1 IA32/X64 Processor Error Information Structure. Signed-off-by: Yazen Ghannam --- Link: https://lkml.kernel.org/r/20180226193904.20532-5-yazen.ghan...@amd.com v2->v3: * Only print raw GUID for unknown error types. v1->v2: * Change use of enum type to u8. * Fix indentation on multi-line statements. drivers/firmware/efi/cper-x86.c | 47 +++-- 1 file changed, 45 insertions(+), 2 deletions(-) diff --git a/drivers/firmware/efi/cper-x86.c b/drivers/firmware/efi/cper-x86.c index a9ab3bbf7986..57fbea5b3a8c 100644 --- a/drivers/firmware/efi/cper-x86.c +++ b/drivers/firmware/efi/cper-x86.c @@ -13,17 +13,53 @@ #define VALID_CPUID_INFO BIT_ULL(1) #define VALID_PROC_ERR_INFO_NUM(bits) (((bits) & GENMASK_ULL(7, 2)) >> 2) +#define INFO_ERR_STRUCT_TYPE_CACHE \ + GUID_INIT(0xA55701F5, 0xE3EF, 0x43DE, 0xAC, 0x72, 0x24, 0x9B, \ + 0x57, 0x3F, 0xAD, 0x2C) +#define INFO_ERR_STRUCT_TYPE_TLB \ + GUID_INIT(0xFC06B535, 0x5E1F, 0x4562, 0x9F, 0x25, 0x0A, 0x3B, \ + 0x9A, 0xDB, 0x63, 0xC3) +#define INFO_ERR_STRUCT_TYPE_BUS \ + GUID_INIT(0x1CF3F8B3, 0xC5B1, 0x49a2, 0xAA, 0x59, 0x5E, 0xEF, \ + 0x92, 0xFF, 0xA6, 0x3C) +#define INFO_ERR_STRUCT_TYPE_MS \ + GUID_INIT(0x48AB7F57, 0xDC34, 0x4f6c, 0xA7, 0xD3, 0xB0, 0xB5, \ + 0xB0, 0xA7, 0x43, 0x14) + #define INFO_VALID_CHECK_INFO BIT_ULL(0) #define INFO_VALID_TARGET_ID BIT_ULL(1) #define INFO_VALID_REQUESTOR_IDBIT_ULL(2) #define INFO_VALID_RESPONDER_IDBIT_ULL(3) #define INFO_VALID_IP BIT_ULL(4) +enum err_types { + ERR_TYPE_CACHE = 0, + ERR_TYPE_TLB, + ERR_TYPE_BUS, + ERR_TYPE_MS, + N_ERR_TYPES +}; + +static enum err_types cper_get_err_type(const guid_t *err_type) +{ + if (guid_equal(err_type, &INFO_ERR_STRUCT_TYPE_CACHE)) + return ERR_TYPE_CACHE; + else if (guid_equal(err_type, &INFO_ERR_STRUCT_TYPE_TLB)) + return ERR_TYPE_TLB; + else if (guid_equal(err_type, &INFO_ERR_STRUCT_TYPE_BUS)) + return ERR_TYPE_BUS; + else if (guid_equal(err_type, &INFO_ERR_STRUCT_TYPE_MS)) + return ERR_TYPE_MS; + else + return N_ERR_TYPES; +} + void cper_print_proc_ia(const char *pfx, const struct cper_sec_proc_ia *proc) { int i; struct cper_ia_err_info *err_info; char newpfx[64]; + u8 err_type; if (proc->validation_bits & VALID_LAPIC_ID) printk("%sLocal APIC_ID: 0x%llx\n", pfx, proc->lapic_id); @@ -40,8 +76,15 @@ void cper_print_proc_ia(const char *pfx, const struct cper_sec_proc_ia *proc) for (i = 0; i < VALID_PROC_ERR_INFO_NUM(proc->validation_bits); i++) { printk("%sError Information Structure %d:\n", pfx, i); - printk("%sError Structure Type: %pUl\n", newpfx, - &err_info->err_type); + err_type = cper_get_err_type(&err_info->err_type); + printk("%sError Structure Type: %s\n", newpfx, + err_type < ARRAY_SIZE(cper_proc_error_type_strs) ? + cper_proc_error_type_strs[err_type] : "unknown"); + + if (err_type >= N_ERR_TYPES) { + printk("%sError Structure Type: %pUl\n", newpfx, + &err_info->err_type); + } if (err_info->validation_bits & INFO_VALID_CHECK_INFO) { printk("%sCheck Information: 0x%016llx\n", newpfx, -- 2.14.1
[PATCH v3 2/8] efi: Decode IA32/X64 Processor Error Section
From: Yazen Ghannam Recognize the IA32/X64 Processor Error Section. Do the section decoding in a new "cper-x86.c" file and add this to the Makefile depending on a new "UEFI_CPER_X86" config option. Print the Local APIC ID and CPUID info from the Processor Error Record. The "Processor Error Info" and "Processor Context" fields will be decoded in following patches. Based on UEFI 2.7 Table 252. Processor Error Record. Signed-off-by: Yazen Ghannam --- Link: https://lkml.kernel.org/r/20180226193904.20532-3-yazen.ghan...@amd.com v2->v3: * Fix table number in commit message. * Don't print raw validation bits. v1->v2: * Change config option depends to "X86" instead of "X86_32 || X64_64". * Remove extra newline in Makefile changes. * Drop author copyright line. drivers/firmware/efi/Kconfig| 5 + drivers/firmware/efi/Makefile | 1 + drivers/firmware/efi/cper-x86.c | 23 +++ drivers/firmware/efi/cper.c | 10 ++ include/linux/cper.h| 2 ++ 5 files changed, 41 insertions(+) create mode 100644 drivers/firmware/efi/cper-x86.c diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig index 3098410abad8..781a4a337557 100644 --- a/drivers/firmware/efi/Kconfig +++ b/drivers/firmware/efi/Kconfig @@ -174,6 +174,11 @@ config UEFI_CPER_ARM depends on UEFI_CPER && ( ARM || ARM64 ) default y +config UEFI_CPER_X86 + bool + depends on UEFI_CPER && X86 + default y + config EFI_DEV_PATH_PARSER bool depends on ACPI diff --git a/drivers/firmware/efi/Makefile b/drivers/firmware/efi/Makefile index cb805374f4bc..5f9f5039de50 100644 --- a/drivers/firmware/efi/Makefile +++ b/drivers/firmware/efi/Makefile @@ -31,3 +31,4 @@ obj-$(CONFIG_ARM) += $(arm-obj-y) obj-$(CONFIG_ARM64)+= $(arm-obj-y) obj-$(CONFIG_EFI_CAPSULE_LOADER) += capsule-loader.o obj-$(CONFIG_UEFI_CPER_ARM)+= cper-arm.o +obj-$(CONFIG_UEFI_CPER_X86)+= cper-x86.o diff --git a/drivers/firmware/efi/cper-x86.c b/drivers/firmware/efi/cper-x86.c new file mode 100644 index ..863f0cd2a0ff --- /dev/null +++ b/drivers/firmware/efi/cper-x86.c @@ -0,0 +1,23 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (C) 2018, Advanced Micro Devices, Inc. + +#include + +/* + * We don't need a "CPER_IA" prefix since these are all locally defined. + * This will save us a lot of line space. + */ +#define VALID_LAPIC_ID BIT_ULL(0) +#define VALID_CPUID_INFO BIT_ULL(1) + +void cper_print_proc_ia(const char *pfx, const struct cper_sec_proc_ia *proc) +{ + if (proc->validation_bits & VALID_LAPIC_ID) + printk("%sLocal APIC_ID: 0x%llx\n", pfx, proc->lapic_id); + + if (proc->validation_bits & VALID_CPUID_INFO) { + printk("%sCPUID Info:\n", pfx); + print_hex_dump(pfx, "", DUMP_PREFIX_OFFSET, 16, 4, proc->cpuid, + sizeof(proc->cpuid), 0); + } +} diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c index c165933ebf38..5a59b582c9aa 100644 --- a/drivers/firmware/efi/cper.c +++ b/drivers/firmware/efi/cper.c @@ -469,6 +469,16 @@ cper_estatus_print_section(const char *pfx, struct acpi_hest_generic_data *gdata cper_print_proc_arm(newpfx, arm_err); else goto err_section_too_small; +#endif +#if defined(CONFIG_UEFI_CPER_X86) + } else if (guid_equal(sec_type, &CPER_SEC_PROC_IA)) { + struct cper_sec_proc_ia *ia_err = acpi_hest_get_payload(gdata); + + printk("%ssection_type: IA32/X64 processor error\n", newpfx); + if (gdata->error_data_length >= sizeof(*ia_err)) + cper_print_proc_ia(newpfx, ia_err); + else + goto err_section_too_small; #endif } else { const void *err = acpi_hest_get_payload(gdata); diff --git a/include/linux/cper.h b/include/linux/cper.h index 4b5f8459b403..9c703a0abe6e 100644 --- a/include/linux/cper.h +++ b/include/linux/cper.h @@ -551,5 +551,7 @@ const char *cper_mem_err_unpack(struct trace_seq *, struct cper_mem_err_compact *); void cper_print_proc_arm(const char *pfx, const struct cper_sec_proc_arm *proc); +void cper_print_proc_ia(const char *pfx, + const struct cper_sec_proc_ia *proc); #endif -- 2.14.1
[PATCH 1/2] Revert "x86/mce/AMD: Collect error info even if valid bits are not set"
From: Yazen Ghannam This reverts commit 4b1e84276a6172980c5bf39aa091ba13e90d6dad. Software uses the valid bits to decide if the values can be used for further processing or other actions. So setting the valid bits will have software act on values that it shouldn't be acting on. The recommendation to save all the register values does not mean that the values are always valid. Signed-off-by: Yazen Ghannam --- arch/x86/kernel/cpu/mcheck/mce.c | 14 -- 1 file changed, 14 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index 86079b90ebcf..42cf2880d0ed 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -446,20 +446,6 @@ static inline void mce_gather_info(struct mce *m, struct pt_regs *regs) if (mca_cfg.rip_msr) m->ip = mce_rdmsrl(mca_cfg.rip_msr); } - - /* -* Error handlers should save the values in MCA_ADDR, MCA_MISC0, and -* MCA_SYND even if MCA_STATUS[AddrV], MCA_STATUS[MiscV], and -* MCA_STATUS[SyndV] are zero. -*/ - if (m->cpuvendor == X86_VENDOR_AMD) { - u64 status = MCI_STATUS_ADDRV | MCI_STATUS_MISCV; - - if (mce_flags.smca) - status |= MCI_STATUS_SYNDV; - - m->status |= status; - } } int mce_available(struct cpuinfo_x86 *c) -- 2.14.1
[PATCH 2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND} register contents
From: Yazen Ghannam The Intel SDM and AMD APM both state that the contents of the MCA_ADDR register should be saved if MCA_STATUS[ADDRV] is set. The same applies to MCA_MISC and MCA_SYND (on SMCA systems) and their respective valid bits. However, the Fam17h Processor Programming Reference states "Error handlers should save the values in MCA_ADDR, MCA_MISC0, and MCA_SYND even if MCA_STATUS[AddrV], MCA_STATUS[MiscV], and MCA_STATUS[SyndV] are zero." This is to ensure that all MCA state information is collected even if software cannot act upon it (because the valid bits are cleared). So always save the auxiliary MCA register contents even if the valid bits are cleared. This should not affect error processing because software should still check the valid bits before using the register contents for error processing. Also, print MCA_{ADDR,MISC,SYND} even if their valid bits are not set. Printing from EDAC/mce_amd is included here since we want to do this on AMD systems. Signed-off-by: Yazen Ghannam --- arch/x86/kernel/cpu/mcheck/mce.c | 23 +++ arch/x86/kernel/cpu/mcheck/mce_amd.c | 10 +++--- drivers/edac/mce_amd.c | 10 +++--- 3 files changed, 13 insertions(+), 30 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index 42cf2880d0ed..a556e1cadfbc 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -248,19 +248,14 @@ static void __print_mce(struct mce *m) } pr_emerg(HW_ERR "TSC %llx ", m->tsc); - if (m->addr) - pr_cont("ADDR %llx ", m->addr); - if (m->misc) - pr_cont("MISC %llx ", m->misc); + pr_cont("ADDR %016llx ", m->addr); + pr_cont("MISC %016llx\n", m->misc); if (mce_flags.smca) { - if (m->synd) - pr_cont("SYND %llx ", m->synd); - if (m->ipid) - pr_cont("IPID %llx ", m->ipid); + pr_emerg(HW_ERR "IPID %016llx ", m->ipid); + pr_cont("SYND %016llx\n", m->synd); } - pr_cont("\n"); /* * Note this output is parsed by external tools and old fields * should not be changed. @@ -639,12 +634,10 @@ static struct notifier_block mce_default_nb = { */ static void mce_read_aux(struct mce *m, int i) { - if (m->status & MCI_STATUS_MISCV) - m->misc = mce_rdmsrl(msr_ops.misc(i)); + m->misc = mce_rdmsrl(msr_ops.misc(i)); + m->addr = mce_rdmsrl(msr_ops.addr(i)); if (m->status & MCI_STATUS_ADDRV) { - m->addr = mce_rdmsrl(msr_ops.addr(i)); - /* * Mask the reported address by the reported granularity. */ @@ -667,9 +660,7 @@ static void mce_read_aux(struct mce *m, int i) if (mce_flags.smca) { m->ipid = mce_rdmsrl(MSR_AMD64_SMCA_MCx_IPID(i)); - - if (m->status & MCI_STATUS_SYNDV) - m->synd = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND(i)); + m->synd = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND(i)); } } diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c index f7666eef4a87..5a37ae704578 100644 --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c @@ -799,13 +799,12 @@ static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc) mce_setup(&m); m.status = status; + m.addr = addr; m.misc = misc; m.bank = bank; m.tsc= rdtsc(); if (m.status & MCI_STATUS_ADDRV) { - m.addr = addr; - /* * Extract [55:] where lsb is the least significant * *valid* bit of the address bits. @@ -819,9 +818,7 @@ static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc) if (mce_flags.smca) { rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m.ipid); - - if (m.status & MCI_STATUS_SYNDV) - rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m.synd); + rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m.synd); } mce_log(&m); @@ -849,8 +846,7 @@ _log_error_bank(unsigned int bank, u32 msr_stat, u32 msr_addr, u64 misc) if (!(status & MCI_STATUS_VAL)) return false; - if (status & MCI_STATUS_ADDRV) - rdmsrl(msr_addr, addr); + rdmsrl(msr_addr, addr); __log_error(bank, status, addr, misc); diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c index 2ab4d61ee47e..004425cc8ddf 100644 --- a/drivers/edac/mce_amd.c +++ b/drivers/edac/mce_amd.c @@ -990,16 +990,12 @@ amd_decode_mce(st
[PATCH 2/3] EDAC/amd64: Only remove instances that exist
From: Yazen Ghannam An instance may have failed probing because the probed node did not have DRAM installed. When the module is unloaded we'll get a WARNing when we try to remove a non-existent instance. Save a bitmask of enabled instances with a bit for each node. Only try to remove an instance if it was successfully enabled in the first place. Cc: # 4.14.x Signed-off-by: Yazen Ghannam --- drivers/edac/amd64_edac.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index af0ce9aa8d24..054086b19c6c 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -13,6 +13,8 @@ module_param(report_gart_errors, int, 0644); static int ecc_enable_override; module_param(ecc_enable_override, int, 0644); +static unsigned long int enabled_instances; + static struct msr __percpu *msrs; /* Per-node stuff */ @@ -3366,6 +3368,7 @@ static int probe_one_instance(unsigned int nid) goto err_enable; } + set_bit(nid, &enabled_instances); return ret; err_enable: @@ -3383,6 +3386,9 @@ static void remove_one_instance(unsigned int nid) struct mem_ctl_info *mci; struct amd64_pvt *pvt; + if (!test_bit(nid, &enabled_instances)) + return; + mci = find_mci_by_dev(&F3->dev); WARN_ON(!mci); @@ -3405,6 +3411,8 @@ static void remove_one_instance(unsigned int nid) kfree(pvt); edac_mc_free(mci); + + clear_bit(nid, &enabled_instances); } static void setup_pci_device(void) -- 2.14.1
[PATCH 1/3] EDAC/amd64: Print ECC enabled/disabled for nodes with enabled MCs
From: Yazen Ghannam It's possible that a system can be used without any DRAM populated on one or more physical Dies on multi-die systems. Firmware will not enable DRAM ECC on Dies without DRAM. Users will then see a message about DRAM ECC disabled on those nodes without DRAM. However, DRAM ECC may, in fact, be enabled on the other Dies that have DRAM. Only print ECC enabled/disabled information for nodes that have at least one enabled memory channel. A memory channel that is unused, i.e. has no DRAM, should be seen as disabled. DRAM ECC information is not relevant on nodes without DRAM. Cc: # 4.14.x Signed-off-by: Yazen Ghannam --- drivers/edac/amd64_edac.c | 20 ++-- 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index 329cb96f886f..af0ce9aa8d24 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -3035,6 +3035,7 @@ static const char *ecc_msg = static bool ecc_enabled(struct pci_dev *F3, u16 nid) { bool nb_mce_en = false; + bool mc_en = true; u8 ecc_en = 0, i; u32 value; @@ -3060,6 +3061,8 @@ static bool ecc_enabled(struct pci_dev *F3, u16 nid) ecc_en_mask |= BIT(i); } + mc_en = !!umc_en_mask; + /* Check whether at least one UMC is enabled: */ if (umc_en_mask) ecc_en = umc_en_mask == ecc_en_mask; @@ -3079,14 +3082,19 @@ static bool ecc_enabled(struct pci_dev *F3, u16 nid) MSR_IA32_MCG_CTL, nid); } - amd64_info("Node %d: DRAM ECC %s.\n", - nid, (ecc_en ? "enabled" : "disabled")); + /* +* Only print ECC enabled/disabled messages for nodes with enabled +* memory controllers. +*/ + if (mc_en) { + amd64_info("Node %d: DRAM ECC %s.\n", + nid, (ecc_en ? "enabled" : "disabled")); - if (!ecc_en || !nb_mce_en) { - amd64_info("%s", ecc_msg); - return false; + if (!ecc_en || !nb_mce_en) + amd64_info("%s", ecc_msg); } - return true; + + return ecc_en && nb_mce_en; } static inline void -- 2.14.1
[PATCH 3/3] EDAC/amd64: Add DIMM device type for Fam17h
From: Yazen Ghannam Set the DIMM device type for Fam17h. Cc: # 4.14.x Signed-off-by: Yazen Ghannam --- drivers/edac/amd64_edac.c | 17 ++--- drivers/edac/amd64_edac.h | 1 + 2 files changed, 15 insertions(+), 3 deletions(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index 054086b19c6c..59d0085e4ffa 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -998,7 +998,7 @@ static void read_dct_base_mask(struct amd64_pvt *pvt) static void determine_memory_type(struct amd64_pvt *pvt) { - u32 dram_ctrl, dcsm; + u32 dram_ctrl, dcsm, dimm_cfg; switch (pvt->fam) { case 0xf: @@ -1046,12 +1046,22 @@ static void determine_memory_type(struct amd64_pvt *pvt) goto ddr3; case 0x17: - if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(5)) + dimm_cfg = pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg; + + if (dimm_cfg & BIT(5)) pvt->dram_type = MEM_LRDDR4; - else if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(4)) + else if (dimm_cfg & BIT(4)) pvt->dram_type = MEM_RDDR4; else pvt->dram_type = MEM_DDR4; + + if (dimm_cfg & BIT(7)) + pvt->dev_type = DEV_X16; + else if (dimm_cfg & BIT(6)) + pvt->dev_type = DEV_X4; + else + pvt->dev_type = DEV_UNKNOWN; + return; default: @@ -2855,6 +2865,7 @@ static int init_csrows(struct mem_ctl_info *mci) for (j = 0; j < pvt->channel_count; j++) { dimm = csrow->channels[j]->dimm; dimm->mtype = pvt->dram_type; + dimm->dtype = pvt->dev_type; dimm->edac_mode = edac_mode; } } diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h index 1d4b74e9a037..47696d3ce487 100644 --- a/drivers/edac/amd64_edac.h +++ b/drivers/edac/amd64_edac.h @@ -368,6 +368,7 @@ struct amd64_pvt { /* cache the dram_type */ enum mem_type dram_type; + enum dev_type dev_type; struct amd64_umc *umc; /* UMC registers */ }; -- 2.14.1
Re: [PATCH 0/4] MCE wrapper and support for new SMCA syndrome MSRs
On Fri, Jun 21, 2024 at 06:58:23PM +0200, Borislav Petkov wrote: > On Thu, May 30, 2024 at 04:16:16PM -0500, Avadhut Naik wrote: > > arch/x86/include/asm/mce.h | 20 ++- > > arch/x86/kernel/cpu/mce/apei.c | 111 ++ > > arch/x86/kernel/cpu/mce/core.c | 191 ++-- > > arch/x86/kernel/cpu/mce/dev-mcelog.c| 2 +- > > arch/x86/kernel/cpu/mce/genpool.c | 20 +-- > > arch/x86/kernel/cpu/mce/inject.c| 4 +- > > arch/x86/kernel/cpu/mce/internal.h | 4 +- > > drivers/acpi/acpi_extlog.c | 2 +- > > drivers/acpi/nfit/mce.c | 2 +- > > drivers/edac/i7core_edac.c | 2 +- > > drivers/edac/igen6_edac.c | 2 +- > > drivers/edac/mce_amd.c | 27 +++- > > drivers/edac/pnd2_edac.c| 2 +- > > drivers/edac/sb_edac.c | 2 +- > > drivers/edac/skx_common.c | 2 +- > > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +- > > drivers/ras/amd/fmpm.c | 2 +- > > drivers/ras/cec.c | 2 +- > > include/trace/events/mce.h | 51 --- > > 19 files changed, 286 insertions(+), 164 deletions(-) > > This doesn't apply anymore - please redo this ontop of the latest tip/master. > Avadhut, You can drop the dependencies on other sets. We can sort out any conflicts as needed. Thanks, Yazen
Re: [PATCH v2 4/4] EDAC/mce_amd: Add support for FRU Text in MCA
On Wed, Jun 26, 2024 at 08:20:13PM +0200, Borislav Petkov wrote: > On Wed, Jun 26, 2024 at 01:00:30PM -0500, Naik, Avadhut wrote: > > > > > > Why are you clearing it if you're overwriting it immediately? > > > > > Since its a local variable, wanted to ensure that the memory is zeroed out > > to prevent > > any issues with the %s specifier, used later on. > > What issues? > > > Would you recommend removing that and using initializer instead for the > > string? > > I'd recommend looking at what the code does and then really thinking whether > that makes any sense. > We need to make sure the string is NULL-terminated. So the memset() could be replaced with this: frutext[16] = '\0'; Or better yet, maybe we can use scnprintf() or similar. Thanks, Yazen
[PATCH v2] x86/smpboot: Don't do mwait_play_dead() on AMD systems
From: Yazen Ghannam Recent AMD systems support using MWAIT for C1 state. However, MWAIT will not allow deeper cstates than C1 on current systems. With play_dead() we expect the OS to use the deepest state available. The deepest state available on AMD systems is reached through SystemIO or HALT. If MWAIT is available, we use it instead of the other methods, so we never reach the deepest state. Don't try to use MWAIT to play_dead() on AMD systems. Instead, we'll use CPUIDLE to enter the deepest state advertised by firmware. If CPUIDLE is not available then we fallback to HALT. Signed-off-by: Yazen Ghannam --- Link: https://lkml.kernel.org/r/20180402183424.48222-1-yazen.ghan...@amd.com v1->v2: * Drop comment in code. arch/x86/kernel/smpboot.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index ff99e2b6fc54..12599e55e040 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1536,6 +1536,8 @@ static inline void mwait_play_dead(void) void *mwait_ptr; int i; + if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) + return; if (!this_cpu_has(X86_FEATURE_MWAIT)) return; if (!this_cpu_has(X86_FEATURE_CLFLUSH)) -- 2.14.1
[PATCH] x86/mce: Increase maximum number of banks to 64
From: Akshay Gupta ...because future AMD systems will support up to 64 MCA banks per CPU. MAX_NR_BANKS is used to allocate a number of data structures, and it is used as a ceiling for values read from MCG_CAP[Count]. Therefore, this change will have no functional effect on existing systems with 32 or fewer MCA banks per CPU. Signed-off-by: Akshay Gupta [ Adjust commit message and code comment. ] Signed-off-by: Yazen Ghannam --- arch/x86/include/asm/mce.h | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h index 6adced6e7dd3..109af5c7f515 100644 --- a/arch/x86/include/asm/mce.h +++ b/arch/x86/include/asm/mce.h @@ -200,12 +200,8 @@ void mce_setup(struct mce *m); void mce_log(struct mce *m); DECLARE_PER_CPU(struct device *, mce_device); -/* - * Maximum banks number. - * This is the limit of the current register layout on - * Intel CPUs. - */ -#define MAX_NR_BANKS 32 +/* Maximum number of MCA banks per CPU. */ +#define MAX_NR_BANKS 64 #ifdef CONFIG_X86_MCE_INTEL void mce_intel_feature_init(struct cpuinfo_x86 *c); -- 2.25.1
Re: [PATCH] x86/mce: Increase maximum number of banks to 64
On Thu, Aug 20, 2020 at 07:15:18PM +0200, Borislav Petkov wrote: > On Thu, Aug 20, 2020 at 05:06:24PM +0000, Yazen Ghannam wrote: > > From: Akshay Gupta > > > > ...because future AMD systems will support up to 64 MCA banks per CPU. > > > > MAX_NR_BANKS is used to allocate a number of data structures, and it is > > used as a ceiling for values read from MCG_CAP[Count]. Therefore, this > > change will have no functional effect on existing systems with 32 or > > fewer MCA banks per CPU. > > Of course it will, grep for MAX_NR_BANKS and look at all those bitmaps > and arrays which get defined with MAX_NR_BANKS size. With your change, > they will double in size. > > How much does vmlinux size grow with your change? > It seems to get smaller. -rwxrwxr-x 1 yghannam yghannam 807634088 Aug 20 17:51 vmlinux-32banks -rwxrwxr-x 1 yghannam yghannam 807634072 Aug 20 17:50 vmlinux-64banks Any ideas? Maybe there's some alignment change? Or a build issue on my end? Thanks, Yazen
Re: [PATCH] x86/MCE/AMD, EDAC/mce_amd
On Sun, Aug 09, 2020 at 12:35:59PM +0800, Feng zhou wrote: > From: zhoufeng > > The edac_mce_amd module calls decode_dram_ecc() on AMD Family17h and > later systems. This function is used in amd64_edac_mod to do > system-specific decoding for DRAM ECC errors. The function takes a > "NodeId" as a parameter. > > In AMD documentation, NodeId is used to identify a physical die in a > system. This can be used to identify a node in the AMD_NB code and also > it is used with umc_normaddr_to_sysaddr(). > > However, the input used for decode_dram_ecc() is currently the NUMA node > of a logical CPU. so this will cause the address translation function to > fail or report incorrect results. > > Signed-off-by: zhoufeng > --- > drivers/edac/mce_amd.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c > index 325aedf46ff2..73c805113322 100644 > --- a/drivers/edac/mce_amd.c > +++ b/drivers/edac/mce_amd.c > @@ -996,7 +996,7 @@ static void decode_smca_error(struct mce *m) > } > > if (bank_type == SMCA_UMC && xec == 0 && decode_dram_ecc) > - decode_dram_ecc(cpu_to_node(m->extcpu), m); > + decode_dram_ecc(topology_physical_package_id(m->extcpu), m); This will break on Naples systems, because the NodeId and the physical package ID will not match. I can send a patch soon that will work for Naples, Rome, and later systems. Thanks, Yazen
Re: [PATCH 2/2] x86/MCE/AMD Support new memory interleaving schemes during address translation
On Sat, Aug 15, 2020 at 11:13:36AM +0200, Ingo Molnar wrote: > > * Yazen Ghannam wrote: > > > + /* Read D18F1x208 (System Fabric ID Mask 0). */ > > + if (amd_df_indirect_read(nid, 1, 0x208, umc, &tmp)) > > + goto out_err; > > + > > + /* Determine if system is a legacy Data Fabric type. */ > > + legacy_df = !(tmp & 0xFF); > > 1) > > I see this pattern in a lot of places in the code, first the magic > constant 0x208 is explained a comment, then it is *repeated* and used > it in the code... > > How about introducing an obviously named enum for it instead, which > would then be self-documenting, saving the comment and removing magic > numbers: > > if (amd_df_indirect_read(nid, 1, AMD_REG_FAB_ID, umc, ®_fab_id)) > goto out_err; > > (The symbolic name should be something better, I just guessed > something quickly.) > > Please clean this up in a separate patch, not part of the already > large patch that introduces a new feature. > Okay, will do. > 2) > > 'tmp & 0xFF' is some sort of fabric version ID value, with a value of > 0 denoting legacy (pre-Rome) systems, right? > > How about making that explicit: > > df_version = reg_fab_id & 0xFF; > > I'm pretty sure such a version ID might come handy later on, should > there be quirks or new capabilities with the newer systems ... > Not exactly. The register field is Read-as-Zero on legacy systems. The versions are 2 and 3 where 2 is the "legacy" version. But I can make this change. For example: df_version = reg_fab_id & 0xFF ? 3 : 2; > > > ret_addr -= hi_addr_offset; > > @@ -728,23 +740,31 @@ int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, > > u8 umc, u64 *sys_addr) > > } > > > > lgcy_mmio_hole_en = tmp & BIT(1); > > - intlv_num_chan= (tmp >> 4) & 0xF; > > - intlv_addr_sel= (tmp >> 8) & 0x7; > > - dram_base_addr= (tmp & GENMASK_ULL(31, 12)) << 16; > > > > - /* {0, 1, 2, 3} map to address bits {8, 9, 10, 11} respectively */ > > - if (intlv_addr_sel > 3) { > > - pr_err("%s: Invalid interleave address select %d.\n", > > - __func__, intlv_addr_sel); > > - goto out_err; > > + if (legacy_df) { > > + intlv_num_chan= (tmp >> 4) & 0xF; > > + intlv_addr_sel= (tmp >> 8) & 0x7; > > + } else { > > + intlv_num_chan= (tmp >> 2) & 0xF; > > + intlv_num_dies= (tmp >> 6) & 0x3; > > + intlv_num_sockets = (tmp >> 8) & 0x1; > > + intlv_addr_sel= (tmp >> 9) & 0x7; > > } > > > > + dram_base_addr= (tmp & GENMASK_ULL(31, 12)) << 16; > > + > > /* Read D18F0x114 (DramLimitAddress). */ > > if (amd_df_indirect_read(nid, 0, 0x114 + (8 * base), umc, &tmp)) > > goto out_err; > > > > - intlv_num_sockets = (tmp >> 8) & 0x1; > > - intlv_num_dies= (tmp >> 10) & 0x3; > > + if (legacy_df) { > > + intlv_num_sockets = (tmp >> 8) & 0x1; > > + intlv_num_dies= (tmp >> 10) & 0x3; > > + dst_fabric_id = tmp & 0xFF; > > + } else { > > + dst_fabric_id = tmp & 0x3FF; > > + } > > + > > dram_limit_addr = ((tmp & GENMASK_ULL(31, 12)) << 16) | > > GENMASK_ULL(27, 0); > > Could we please structure this code in a bit more readable fashion? > > 1) > > Such as not using the meaningless 'tmp' variable name to first read > out DramOffset, then DramLimitAddress? > IIRC, the "tmp" variable come to be in the review for the patch which added this function. There are a few places where the register name and the value needed have the same or similar name. For example, DramLimitAddress is the register name and also a field within the register. So we'd have a reg_dram_limit_addr and val_dram_limit_addr. The "tmp" variable removes the need for the "reg_" variable. But I think this can be reworked so that the final variable name is reused. The register value can read into the variable, extra fields can be extracted from it, and the final value can be adjusted as needed. > How about naming them a bit more obviously, and retrieving them in a > single step: > > if (amd_df_indirect_read(nid, 0, 0x1B4, umc, ®_dram_off)) > goto out_err; >
Re: [PATCH] EDAC/AMD64: Update scrub register addresses for newer models
On Mon, Jan 18, 2021 at 04:30:58AM +0300, WGH wrote: > On 16/01/2021 17:33, Yazen Ghannam wrote: > > From: Yazen Ghannam > > > > The Family 17h scrubber registers moved to different offset starting > > with Model 30h. The new register offsets are used for all currently > > available models since then. > > > > Use the new register addresses as the defaults. > > > > Set the proper scrub register addresses during module init for older > > models. > > So I tested the patch on my machine (AMD Ryzen 9 3900XT on ASRock B550 > Extreme4 motherboard, Linux 5.10.7). > > The /sys/devices/system/edac/mc/mc0/sdram_scrub_rate value seems to be stuck > at 12284069 right after the boot, and does not change. > Writes to the file do not report any errors. > > dmesg: > > [ 0.549451] EDAC MC: Ver: 3.0.0 > [ 0.817576] EDAC amd64: F17h_M70h detected (node 0). > [ 0.818159] EDAC amd64: Node 0: DRAM ECC enabled. > [ 0.818717] EDAC amd64: MCT channel count: 2 > [ 0.819324] EDAC MC0: Giving out device to module amd64_edac controller > F17h_M70h: DEV :00:18.3 (INTERRUPT) > [ 0.819909] EDAC MC: UMC0 chip selects: > [ 0.819910] EDAC amd64: MC: 0: 16384MB 1: 16384MB > [ 0.820488] EDAC amd64: MC: 2: 16384MB 3: 16384MB > [ 0.821067] EDAC MC: UMC1 chip selects: > [ 0.821067] EDAC amd64: MC: 0: 16384MB 1: 16384MB > [ 0.821630] EDAC amd64: MC: 2: 16384MB 3: 16384MB > [ 0.822187] EDAC amd64: using x16 syndromes. > [ 0.822739] EDAC PCI0: Giving out device to module amd64_edac controller > EDAC PCI controller: DEV :00:18.0 (POLLED) > [ 0.823314] AMD64 EDAC driver v3.5.0 > > Thanks for testing. I'll try to find a similar system and check it out. Thanks, Yazen
Re: [PATCH] EDAC/AMD64: Update scrub register addresses for newer models
On Mon, Jan 18, 2021 at 08:31:12PM +0100, Borislav Petkov wrote: > On Sat, Jan 16, 2021 at 02:33:53PM +0000, Yazen Ghannam wrote: > > +static struct { > > + u32 base, limit; > > +} f17h_scrub_regs = {F17H_M30H_SCR_BASE_ADDR, F17H_M30H_SCR_LIMIT_ADDR}; > > Why not make this part of struct amd64_umc so that you can access them > through pvt->umc? > We have a struct amd64_umc per channel, so putting these fixed values there seemed redundant. Would you mind if we put this in struct amd64_family_type? We can then set the values per family/model group like we do with the max_mcs. Thanks, Yazen
[PATCH] EDAC/AMD64: Update scrub register addresses for newer models
From: Yazen Ghannam The Family 17h scrubber registers moved to different offset starting with Model 30h. The new register offsets are used for all currently available models since then. Use the new register addresses as the defaults. Set the proper scrub register addresses during module init for older models. Reported-by: WGH Signed-off-by: Yazen Ghannam --- drivers/edac/amd64_edac.c | 23 ++- drivers/edac/amd64_edac.h | 2 ++ 2 files changed, 20 insertions(+), 5 deletions(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index 9868f95a5622..b324b1589e5a 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -167,6 +167,10 @@ static inline int amd64_read_dct_pci_cfg(struct amd64_pvt *pvt, u8 dct, * other archs, we might not have access to the caches directly. */ +static struct { + u32 base, limit; +} f17h_scrub_regs = {F17H_M30H_SCR_BASE_ADDR, F17H_M30H_SCR_LIMIT_ADDR}; + static inline void __f17h_set_scrubval(struct amd64_pvt *pvt, u32 scrubval) { /* @@ -176,10 +180,10 @@ static inline void __f17h_set_scrubval(struct amd64_pvt *pvt, u32 scrubval) */ if (scrubval >= 0x5 && scrubval <= 0x14) { scrubval -= 0x5; - pci_write_bits32(pvt->F6, F17H_SCR_LIMIT_ADDR, scrubval, 0xF); - pci_write_bits32(pvt->F6, F17H_SCR_BASE_ADDR, 1, 0x1); + pci_write_bits32(pvt->F6, f17h_scrub_regs.limit, scrubval, 0xF); + pci_write_bits32(pvt->F6, f17h_scrub_regs.base, 1, 0x1); } else { - pci_write_bits32(pvt->F6, F17H_SCR_BASE_ADDR, 0, 0x1); + pci_write_bits32(pvt->F6, f17h_scrub_regs.base, 0, 0x1); } } /* @@ -257,9 +261,9 @@ static int get_scrub_rate(struct mem_ctl_info *mci) u32 scrubval = 0; if (pvt->umc) { - amd64_read_pci_cfg(pvt->F6, F17H_SCR_BASE_ADDR, &scrubval); + amd64_read_pci_cfg(pvt->F6, f17h_scrub_regs.base, &scrubval); if (scrubval & BIT(0)) { - amd64_read_pci_cfg(pvt->F6, F17H_SCR_LIMIT_ADDR, &scrubval); + amd64_read_pci_cfg(pvt->F6, f17h_scrub_regs.limit, &scrubval); scrubval &= 0xF; scrubval += 0x5; } else { @@ -3568,6 +3572,14 @@ f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt) } } +static void f17h_set_scrub_regs(struct amd64_pvt *pvt) +{ + if ((pvt->fam == 0x17 && pvt->model < 0x30) || pvt->fam == 0x18) { + f17h_scrub_regs.base = F17H_SCR_BASE_ADDR; + f17h_scrub_regs.limit = F17H_SCR_LIMIT_ADDR; + } +} + static void setup_mci_misc_attrs(struct mem_ctl_info *mci) { struct amd64_pvt *pvt = mci->pvt_info; @@ -3577,6 +3589,7 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci) if (pvt->umc) { f17h_determine_edac_ctl_cap(mci, pvt); + f17h_set_scrub_regs(pvt); } else { if (pvt->nbcap & NBCAP_SECDED) mci->edac_ctl_cap |= EDAC_FLAG_SECDED; diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h index 85aa820bc165..4606f72f4258 100644 --- a/drivers/edac/amd64_edac.h +++ b/drivers/edac/amd64_edac.h @@ -213,6 +213,8 @@ #define F15H_M60H_SCRCTRL 0x1C8 #define F17H_SCR_BASE_ADDR 0x48 #define F17H_SCR_LIMIT_ADDR0x4C +#define F17H_M30H_SCR_BASE_ADDR0x40 +#define F17H_M30H_SCR_LIMIT_ADDR 0x44 /* * Function 3 - Misc Control -- 2.25.1
Re: [PATCH 1/2] EDAC/amd64: Merge sysfs debugging attributes setup code
On Tue, Dec 15, 2020 at 12:05:16PM +0100, Borislav Petkov wrote: > From: Borislav Petkov > > There's no need for them to be in a separate file so merge them into the > main driver compilation unit like the other EDAC drivers do. > > Drop now-unneeded function export, make the function static and shorten > static function names. > > No functional changes. > > Signed-off-by: Borislav Petkov Reviewed-by: Yazen Ghannam Thanks, Yazen
Re: [PATCH 2/2] EDAC/amd64: Merge error injection sysfs facilities
On Tue, Dec 15, 2020 at 12:05:17PM +0100, Borislav Petkov wrote: > From: Borislav Petkov > > Merge them into the main driver and put them inside an EDAC_DEBUG > ifdeffery to simplify the driver and have all debugging/injection stuff > behind a debug build-time switch. > > No functional changes. > > Signed-off-by: Borislav Petkov > --- > drivers/edac/Kconfig | 7 +- > drivers/edac/Makefile | 6 +- > drivers/edac/amd64_edac.c | 237 +- > drivers/edac/amd64_edac.h | 8 -- > drivers/edac/amd64_edac_inj.c | 235 - > 5 files changed, 236 insertions(+), 257 deletions(-) > delete mode 100644 drivers/edac/amd64_edac_inj.c > > diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig > index 7a47680d6f07..9c2e719cb86a 100644 > --- a/drivers/edac/Kconfig > +++ b/drivers/edac/Kconfig > @@ -81,10 +81,9 @@ config EDAC_AMD64 > Support for error detection and correction of DRAM ECC errors on > the AMD64 families (>= K8) of memory controllers. > > -config EDAC_AMD64_ERROR_INJECTION > - bool "Sysfs HW Error injection facilities" > - depends on EDAC_AMD64 > - help > + When EDAC_DEBUG is enabled, hardware error injection facilities > + through sysfs are available: > + > Recent Opterons (Family 10h and later) provide for Memory Error Can we say "Opterons (Family 10h to Family 15h)"? It may also apply to Family 16h, but I don't know if they were branded as Opterons. The injection code in this module doesn't apply to Family 17h and later. Also, Family 17h and later doesn't allow the OS direct access to the error injection registers. They're locked down by security policy, etc. > Injection into the ECC detection circuits. The amd64_edac module > allows the operator/user to inject Uncorrectable and Correctable ... > + > +static umode_t inj_is_visible(struct kobject *kobj, struct attribute *attr, > int idx) > +{ > + struct device *dev = kobj_to_dev(kobj); > + struct mem_ctl_info *mci = container_of(dev, struct mem_ctl_info, dev); > + struct amd64_pvt *pvt = mci->pvt_info; > + > + if (pvt->fam < 0x10) Related to the comment above, can this be changed to the following? if (pvt->fam < 0x10 || pvt->fam >= 0x17) > + return 0; > + return attr->mode; > +} > + Everything else looks good to me. Reviewed-by: Yazen Ghannam Thanks, Yazen
[PATCH] EDAC/amd64: Tone down messages about missing PCI IDs
From: Yazen Ghannam Give these messages a debug severity as they are really only useful to the module developers. Also, drop the "(broken BIOS?)" phrase, since this can cause churn for BIOS folks. The PCI IDs needed by the module, at least on modern systems, are fixed in hardware. Signed-off-by: Yazen Ghannam --- drivers/edac/amd64_edac.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index f7087b90..a3770ffee2ea 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -2665,7 +2665,7 @@ reserve_mc_sibling_devs(struct amd64_pvt *pvt, u16 pci_id1, u16 pci_id2) if (pvt->umc) { pvt->F0 = pci_get_related_function(pvt->F3->vendor, pci_id1, pvt->F3); if (!pvt->F0) { - amd64_err("F0 not found, device 0x%x (broken BIOS?)\n", pci_id1); + edac_dbg(1, "F0 not found, device 0x%x\n", pci_id1); return -ENODEV; } @@ -2674,7 +2674,7 @@ reserve_mc_sibling_devs(struct amd64_pvt *pvt, u16 pci_id1, u16 pci_id2) pci_dev_put(pvt->F0); pvt->F0 = NULL; - amd64_err("F6 not found: device 0x%x (broken BIOS?)\n", pci_id2); + edac_dbg(1, "F6 not found: device 0x%x\n", pci_id2); return -ENODEV; } @@ -2691,7 +2691,7 @@ reserve_mc_sibling_devs(struct amd64_pvt *pvt, u16 pci_id1, u16 pci_id2) /* Reserve the ADDRESS MAP Device */ pvt->F1 = pci_get_related_function(pvt->F3->vendor, pci_id1, pvt->F3); if (!pvt->F1) { - amd64_err("F1 not found: device 0x%x (broken BIOS?)\n", pci_id1); + edac_dbg(1, "F1 not found: device 0x%x\n", pci_id1); return -ENODEV; } @@ -2701,7 +2701,7 @@ reserve_mc_sibling_devs(struct amd64_pvt *pvt, u16 pci_id1, u16 pci_id2) pci_dev_put(pvt->F1); pvt->F1 = NULL; - amd64_err("F2 not found: device 0x%x (broken BIOS?)\n", pci_id2); + edac_dbg(1, "F2 not found: device 0x%x\n", pci_id2); return -ENODEV; } -- 2.25.1
[PATCH] arm64: crypto: Add ARM64 CRC32 hw accelerated module
This module registers a crc32 algorithm and a crc32c algorithm that use the optional CRC32 and CRC32C instructions in ARMv8. Tested on AMD Seattle. Improvement compared to crc32c-generic algorithm: TCRYPT CRC32C speed test shows ~450% speedup. Simple dd write tests to btrfs filesystem show ~30% speedup. Signed-off-by: Yazen Ghannam Acked-by: Steve Capper Acked-by: Ard Biesheuvel --- arch/arm64/crypto/Kconfig | 4 + arch/arm64/crypto/Makefile | 4 + arch/arm64/crypto/crc32-arm64.c | 274 3 files changed, 282 insertions(+) create mode 100644 arch/arm64/crypto/crc32-arm64.c diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig index 5562652..c1a0468 100644 --- a/arch/arm64/crypto/Kconfig +++ b/arch/arm64/crypto/Kconfig @@ -50,4 +50,8 @@ config CRYPTO_AES_ARM64_NEON_BLK select CRYPTO_AES select CRYPTO_ABLK_HELPER +config CRYPTO_CRC32_ARM64 + tristate "CRC32 and CRC32C using optional ARMv8 instructions" + depends on ARM64 + select CRYPTO_HASH endif diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile index a3f935f..5720608 100644 --- a/arch/arm64/crypto/Makefile +++ b/arch/arm64/crypto/Makefile @@ -34,5 +34,9 @@ AFLAGS_aes-neon.o := -DINTERLEAVE=4 CFLAGS_aes-glue-ce.o := -DUSE_V8_CRYPTO_EXTENSIONS +obj-$(CONFIG_CRYPTO_CRC32_ARM64) += crc32-arm64.o + +CFLAGS_crc32-arm64.o := -mcpu=generic+crc + $(obj)/aes-glue-%.o: $(src)/aes-glue.c FORCE $(call if_changed_rule,cc_o_c) diff --git a/arch/arm64/crypto/crc32-arm64.c b/arch/arm64/crypto/crc32-arm64.c new file mode 100644 index 000..9499199 --- /dev/null +++ b/arch/arm64/crypto/crc32-arm64.c @@ -0,0 +1,274 @@ +/* + * crc32-arm64.c - CRC32 and CRC32C using optional ARMv8 instructions + * + * Module based on crypto/crc32c_generic.c + * + * CRC32 loop taken from Ed Nevill's Hadoop CRC patch + * http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201406.mbox/%3C1403687030.3355.19.camel%40localhost.localdomain%3E + * + * Using inline assembly instead of intrinsics in order to be backwards + * compatible with older compilers. + * + * Copyright (C) 2014 Linaro Ltd + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include +#include +#include +#include +#include +#include + +#include + +MODULE_AUTHOR("Yazen Ghannam "); +MODULE_DESCRIPTION("CRC32 and CRC32C using optional ARMv8 instructions"); +MODULE_LICENSE("GPL v2"); + +#define CRC32X(crc, value) __asm__("crc32x %w[c], %w[c], %x[v]":[c]"+r"(crc):[v]"r"(value)) +#define CRC32W(crc, value) __asm__("crc32w %w[c], %w[c], %w[v]":[c]"+r"(crc):[v]"r"(value)) +#define CRC32H(crc, value) __asm__("crc32h %w[c], %w[c], %w[v]":[c]"+r"(crc):[v]"r"(value)) +#define CRC32B(crc, value) __asm__("crc32b %w[c], %w[c], %w[v]":[c]"+r"(crc):[v]"r"(value)) +#define CRC32CX(crc, value) __asm__("crc32cx %w[c], %w[c], %x[v]":[c]"+r"(crc):[v]"r"(value)) +#define CRC32CW(crc, value) __asm__("crc32cw %w[c], %w[c], %w[v]":[c]"+r"(crc):[v]"r"(value)) +#define CRC32CH(crc, value) __asm__("crc32ch %w[c], %w[c], %w[v]":[c]"+r"(crc):[v]"r"(value)) +#define CRC32CB(crc, value) __asm__("crc32cb %w[c], %w[c], %w[v]":[c]"+r"(crc):[v]"r"(value)) + +static u32 crc32_arm64_le_hw(u32 crc, const u8 *p, unsigned int len) +{ + s64 length = len; + + while ((length -= sizeof(u64)) >= 0) { + CRC32X(crc, get_unaligned_le64(p)); + p += sizeof(u64); + } + + /* The following is more efficient than the straight loop */ + if (length & sizeof(u32)) { + CRC32W(crc, get_unaligned_le32(p)); + p += sizeof(u32); + } + if (length & sizeof(u16)) { + CRC32H(crc, get_unaligned_le16(p)); + p += sizeof(u16); + } + if (length & sizeof(u8)) + CRC32B(crc, *p); + + return crc; +} + +static u32 crc32c_arm64_le_hw(u32 crc, const u8 *p, unsigned int len) +{ + s64 length = len; + + while ((length -= sizeof(u64)) >= 0) { + CRC32CX(crc, get_unaligned_le64(p)); + p += sizeof(u64); + } + + /* The following is more efficient than the straight loop */ + if (length & sizeof(u32)) { + CRC32CW(crc, get_unaligned_le32(p)); + p += sizeof(u32); + } + if (length & sizeof(u16)) { + CRC32CH(crc, get_unaligned_le16(p)); + p += sizeof(u16); + } + if (length & sizeof(u8)) +
Re: [PATCH] arm64: crypto: Add ARM64 CRC32 hw accelerated module
+linux-arm-ker...@lists.infradead.org On Wed, Nov 19, 2014 at 11:19 AM, Yazen Ghannam wrote: > This module registers a crc32 algorithm and a crc32c algorithm > that use the optional CRC32 and CRC32C instructions in ARMv8. > > Tested on AMD Seattle. > > Improvement compared to crc32c-generic algorithm: > TCRYPT CRC32C speed test shows ~450% speedup. > Simple dd write tests to btrfs filesystem show ~30% speedup. > > Signed-off-by: Yazen Ghannam > Acked-by: Steve Capper > Acked-by: Ard Biesheuvel > --- > arch/arm64/crypto/Kconfig | 4 + > arch/arm64/crypto/Makefile | 4 + > arch/arm64/crypto/crc32-arm64.c | 274 > > 3 files changed, 282 insertions(+) > create mode 100644 arch/arm64/crypto/crc32-arm64.c > > diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig > index 5562652..c1a0468 100644 > --- a/arch/arm64/crypto/Kconfig > +++ b/arch/arm64/crypto/Kconfig > @@ -50,4 +50,8 @@ config CRYPTO_AES_ARM64_NEON_BLK > select CRYPTO_AES > select CRYPTO_ABLK_HELPER > > +config CRYPTO_CRC32_ARM64 > + tristate "CRC32 and CRC32C using optional ARMv8 instructions" > + depends on ARM64 > + select CRYPTO_HASH > endif > diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile > index a3f935f..5720608 100644 > --- a/arch/arm64/crypto/Makefile > +++ b/arch/arm64/crypto/Makefile > @@ -34,5 +34,9 @@ AFLAGS_aes-neon.o := -DINTERLEAVE=4 > > CFLAGS_aes-glue-ce.o := -DUSE_V8_CRYPTO_EXTENSIONS > > +obj-$(CONFIG_CRYPTO_CRC32_ARM64) += crc32-arm64.o > + > +CFLAGS_crc32-arm64.o := -mcpu=generic+crc > + > $(obj)/aes-glue-%.o: $(src)/aes-glue.c FORCE > $(call if_changed_rule,cc_o_c) > diff --git a/arch/arm64/crypto/crc32-arm64.c b/arch/arm64/crypto/crc32-arm64.c > new file mode 100644 > index 000..9499199 > --- /dev/null > +++ b/arch/arm64/crypto/crc32-arm64.c > @@ -0,0 +1,274 @@ > +/* > + * crc32-arm64.c - CRC32 and CRC32C using optional ARMv8 instructions > + * > + * Module based on crypto/crc32c_generic.c > + * > + * CRC32 loop taken from Ed Nevill's Hadoop CRC patch > + * > http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201406.mbox/%3C1403687030.3355.19.camel%40localhost.localdomain%3E > + * > + * Using inline assembly instead of intrinsics in order to be backwards > + * compatible with older compilers. > + * > + * Copyright (C) 2014 Linaro Ltd > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License version 2 as > + * published by the Free Software Foundation. > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include > + > +MODULE_AUTHOR("Yazen Ghannam "); > +MODULE_DESCRIPTION("CRC32 and CRC32C using optional ARMv8 instructions"); > +MODULE_LICENSE("GPL v2"); > + > +#define CRC32X(crc, value) __asm__("crc32x %w[c], %w[c], > %x[v]":[c]"+r"(crc):[v]"r"(value)) > +#define CRC32W(crc, value) __asm__("crc32w %w[c], %w[c], > %w[v]":[c]"+r"(crc):[v]"r"(value)) > +#define CRC32H(crc, value) __asm__("crc32h %w[c], %w[c], > %w[v]":[c]"+r"(crc):[v]"r"(value)) > +#define CRC32B(crc, value) __asm__("crc32b %w[c], %w[c], > %w[v]":[c]"+r"(crc):[v]"r"(value)) > +#define CRC32CX(crc, value) __asm__("crc32cx %w[c], %w[c], > %x[v]":[c]"+r"(crc):[v]"r"(value)) > +#define CRC32CW(crc, value) __asm__("crc32cw %w[c], %w[c], > %w[v]":[c]"+r"(crc):[v]"r"(value)) > +#define CRC32CH(crc, value) __asm__("crc32ch %w[c], %w[c], > %w[v]":[c]"+r"(crc):[v]"r"(value)) > +#define CRC32CB(crc, value) __asm__("crc32cb %w[c], %w[c], > %w[v]":[c]"+r"(crc):[v]"r"(value)) > + > +static u32 crc32_arm64_le_hw(u32 crc, const u8 *p, unsigned int len) > +{ > + s64 length = len; > + > + while ((length -= sizeof(u64)) >= 0) { > + CRC32X(crc, get_unaligned_le64(p)); > + p += sizeof(u64); > + } > + > + /* The following is more efficient than the straight loop */ > + if (length & sizeof(u32)) { > + CRC32W(crc, get_unaligned_le32(p)); > + p += sizeof(u32); > + } > + if (length & sizeof(u16)) { > + CRC32H(crc, get_unaligned_le16(p)); > + p += sizeof(u16); > + } > + if (length & sizeof(u8)) > +
Re: [PATCH] arm64: crypto: Add ARM64 CRC32 hw accelerated module
Herbert, I have a couple of questions. 1) To which release has the patch been applied? We're just curious for tracking purposes. 2) I'd like to apply Ard's suggestion. Do you prefer a second version of this patch or a separate fixup patch? Thanks, Yazen On Fri, Nov 21, 2014 at 3:39 PM, Ard Biesheuvel wrote: > On 20 November 2014 15:22, Yazen Ghannam wrote: >> +linux-arm-ker...@lists.infradead.org >> >> On Wed, Nov 19, 2014 at 11:19 AM, Yazen Ghannam >> wrote: >>> This module registers a crc32 algorithm and a crc32c algorithm >>> that use the optional CRC32 and CRC32C instructions in ARMv8. >>> >>> Tested on AMD Seattle. >>> >>> Improvement compared to crc32c-generic algorithm: >>> TCRYPT CRC32C speed test shows ~450% speedup. >>> Simple dd write tests to btrfs filesystem show ~30% speedup. >>> >>> Signed-off-by: Yazen Ghannam >>> Acked-by: Steve Capper >>> Acked-by: Ard Biesheuvel >>> --- >>> arch/arm64/crypto/Kconfig | 4 + >>> arch/arm64/crypto/Makefile | 4 + >>> arch/arm64/crypto/crc32-arm64.c | 274 >>> >>> 3 files changed, 282 insertions(+) >>> create mode 100644 arch/arm64/crypto/crc32-arm64.c >>> >>> diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig >>> index 5562652..c1a0468 100644 >>> --- a/arch/arm64/crypto/Kconfig >>> +++ b/arch/arm64/crypto/Kconfig >>> @@ -50,4 +50,8 @@ config CRYPTO_AES_ARM64_NEON_BLK >>> select CRYPTO_AES >>> select CRYPTO_ABLK_HELPER >>> >>> +config CRYPTO_CRC32_ARM64 >>> + tristate "CRC32 and CRC32C using optional ARMv8 instructions" >>> + depends on ARM64 >>> + select CRYPTO_HASH >>> endif >>> diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile >>> index a3f935f..5720608 100644 >>> --- a/arch/arm64/crypto/Makefile >>> +++ b/arch/arm64/crypto/Makefile >>> @@ -34,5 +34,9 @@ AFLAGS_aes-neon.o := -DINTERLEAVE=4 >>> >>> CFLAGS_aes-glue-ce.o := -DUSE_V8_CRYPTO_EXTENSIONS >>> >>> +obj-$(CONFIG_CRYPTO_CRC32_ARM64) += crc32-arm64.o >>> + >>> +CFLAGS_crc32-arm64.o := -mcpu=generic+crc >>> + >>> $(obj)/aes-glue-%.o: $(src)/aes-glue.c FORCE >>> $(call if_changed_rule,cc_o_c) >>> diff --git a/arch/arm64/crypto/crc32-arm64.c >>> b/arch/arm64/crypto/crc32-arm64.c >>> new file mode 100644 >>> index 000..9499199 >>> --- /dev/null >>> +++ b/arch/arm64/crypto/crc32-arm64.c >>> @@ -0,0 +1,274 @@ >>> +/* >>> + * crc32-arm64.c - CRC32 and CRC32C using optional ARMv8 instructions >>> + * >>> + * Module based on crypto/crc32c_generic.c >>> + * >>> + * CRC32 loop taken from Ed Nevill's Hadoop CRC patch >>> + * >>> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201406.mbox/%3C1403687030.3355.19.camel%40localhost.localdomain%3E >>> + * >>> + * Using inline assembly instead of intrinsics in order to be backwards >>> + * compatible with older compilers. >>> + * >>> + * Copyright (C) 2014 Linaro Ltd >>> + * >>> + * This program is free software; you can redistribute it and/or modify >>> + * it under the terms of the GNU General Public License version 2 as >>> + * published by the Free Software Foundation. >>> + */ >>> + >>> +#include > > One final nit: you should not be including this file directly. > You should include instead, and it is up to the > architecture to include either access_ok.h or another implementation > of get_unaligned_leXX > > Granted, the distinction is fairly artificial on arm64, but it does > increase the portability of the code. > > -- > Ard. > > >>> +#include >>> +#include >>> +#include >>> +#include >>> +#include >>> + >>> +#include >>> + >>> +MODULE_AUTHOR("Yazen Ghannam "); >>> +MODULE_DESCRIPTION("CRC32 and CRC32C using optional ARMv8 instructions"); >>> +MODULE_LICENSE("GPL v2"); >>> + >>> +#define CRC32X(crc, value) __asm__("crc32x %w[c], %w[c], >>> %x[v]":[c]"+r"(crc):[v]"r"(value)) >>> +#define CRC32W(crc, value) __asm__("crc32w %w[c], %w[c], >>> %w[v]":[c]"+r"(crc):[v]"r"(value)) >>> +#defin
[PATCH v4 0/8] Decode IA32/X64 CPER
From: Yazen Ghannam This series adds decoding for the IA32/X64 Common Platform Error Record. Patch 1 fixes the IA32/X64 Processor Error Section definition to match the UEFI spec. Patches 2-8 add the new decoding. The patches incrementally add the decoding starting from the top-level "Error Section". Hopefully, this will make reviewing a bit easier compared to one large patch. The formatting of the field names and options is taken from the UEFI spec. I tried to keep everything the same to make searching easier. The patches were written to the UEFI 2.7 spec though the definition of the IA32/X64 CPER seems to be the same as when it was introduced in the UEFI 2.1 spec. Link: https://lkml.kernel.org/r/20180324184940.19762-1-yazen.ghan...@amd.com Changes V3 to V4: * Drop INDENT_SP use. Changes V2 to V3: * Fix table numbers in commit messages. * Don't print raw validation bits. * Only print GUID for unknown error types. Changes V1 to V2: * Remove stable request for all patches. * Address Ard's comments on formatting and other issues. * In Patch 8, always print context info even if the type is not recognized. Yazen Ghannam (8): efi: Fix IA32/X64 Processor Error Record definition efi: Decode IA32/X64 Processor Error Section efi: Decode IA32/X64 Processor Error Info Structure efi: Decode UEFI-defined IA32/X64 Error Structure GUIDs efi: Decode IA32/X64 Cache, TLB, and Bus Check structures efi: Decode additional IA32/X64 Bus Check fields efi: Decode IA32/X64 MS Check structure efi: Decode IA32/X64 Context Info structure drivers/firmware/efi/Kconfig| 5 + drivers/firmware/efi/Makefile | 1 + drivers/firmware/efi/cper-x86.c | 356 drivers/firmware/efi/cper.c | 10 ++ include/linux/cper.h| 4 +- 5 files changed, 375 insertions(+), 1 deletion(-) create mode 100644 drivers/firmware/efi/cper-x86.c -- 2.14.1
[PATCH v4 3/8] efi: Decode IA32/X64 Processor Error Info Structure
From: Yazen Ghannam Print the fields in the IA32/X64 Processor Error Info Structure. Based on UEFI 2.7 Table 253. IA32/X64 Processor Error Information Structure. Signed-off-by: Yazen Ghannam --- Link: https://lkml.kernel.org/r/20180324184940.19762-4-yazen.ghan...@amd.com v3->v4: * Drop INDENT_SP use. v2->v3: * Fix table number in commit message. * Don't print raw validation bits. v1->v2: * Add parantheses around "bits" expression in macro. * Fix indentation on multi-line statements. drivers/firmware/efi/cper-x86.c | 48 + 1 file changed, 48 insertions(+) diff --git a/drivers/firmware/efi/cper-x86.c b/drivers/firmware/efi/cper-x86.c index 863f0cd2a0ff..e0633a103fcf 100644 --- a/drivers/firmware/efi/cper-x86.c +++ b/drivers/firmware/efi/cper-x86.c @@ -9,9 +9,20 @@ */ #define VALID_LAPIC_ID BIT_ULL(0) #define VALID_CPUID_INFO BIT_ULL(1) +#define VALID_PROC_ERR_INFO_NUM(bits) (((bits) & GENMASK_ULL(7, 2)) >> 2) + +#define INFO_VALID_CHECK_INFO BIT_ULL(0) +#define INFO_VALID_TARGET_ID BIT_ULL(1) +#define INFO_VALID_REQUESTOR_IDBIT_ULL(2) +#define INFO_VALID_RESPONDER_IDBIT_ULL(3) +#define INFO_VALID_IP BIT_ULL(4) void cper_print_proc_ia(const char *pfx, const struct cper_sec_proc_ia *proc) { + int i; + struct cper_ia_err_info *err_info; + char newpfx[64]; + if (proc->validation_bits & VALID_LAPIC_ID) printk("%sLocal APIC_ID: 0x%llx\n", pfx, proc->lapic_id); @@ -20,4 +31,41 @@ void cper_print_proc_ia(const char *pfx, const struct cper_sec_proc_ia *proc) print_hex_dump(pfx, "", DUMP_PREFIX_OFFSET, 16, 4, proc->cpuid, sizeof(proc->cpuid), 0); } + + snprintf(newpfx, sizeof(newpfx), "%s ", pfx); + + err_info = (struct cper_ia_err_info *)(proc + 1); + for (i = 0; i < VALID_PROC_ERR_INFO_NUM(proc->validation_bits); i++) { + printk("%sError Information Structure %d:\n", pfx, i); + + printk("%sError Structure Type: %pUl\n", newpfx, + &err_info->err_type); + + if (err_info->validation_bits & INFO_VALID_CHECK_INFO) { + printk("%sCheck Information: 0x%016llx\n", newpfx, + err_info->check_info); + } + + if (err_info->validation_bits & INFO_VALID_TARGET_ID) { + printk("%sTarget Identifier: 0x%016llx\n", + newpfx, err_info->target_id); + } + + if (err_info->validation_bits & INFO_VALID_REQUESTOR_ID) { + printk("%sRequestor Identifier: 0x%016llx\n", + newpfx, err_info->requestor_id); + } + + if (err_info->validation_bits & INFO_VALID_RESPONDER_ID) { + printk("%sResponder Identifier: 0x%016llx\n", + newpfx, err_info->responder_id); + } + + if (err_info->validation_bits & INFO_VALID_IP) { + printk("%sInstruction Pointer: 0x%016llx\n", + newpfx, err_info->ip); + } + + err_info++; + } } -- 2.14.1
[PATCH v4 2/8] efi: Decode IA32/X64 Processor Error Section
From: Yazen Ghannam Recognize the IA32/X64 Processor Error Section. Do the section decoding in a new "cper-x86.c" file and add this to the Makefile depending on a new "UEFI_CPER_X86" config option. Print the Local APIC ID and CPUID info from the Processor Error Record. The "Processor Error Info" and "Processor Context" fields will be decoded in following patches. Based on UEFI 2.7 Table 252. Processor Error Record. Signed-off-by: Yazen Ghannam --- Link: https://lkml.kernel.org/r/20180324184940.19762-3-yazen.ghan...@amd.com v3->v4: * No changes. v2->v3: * Fix table number in commit message. * Don't print raw validation bits. v1->v2: * Change config option depends to "X86" instead of "X86_32 || X64_64". * Remove extra newline in Makefile changes. * Drop author copyright line. drivers/firmware/efi/Kconfig| 5 + drivers/firmware/efi/Makefile | 1 + drivers/firmware/efi/cper-x86.c | 23 +++ drivers/firmware/efi/cper.c | 10 ++ include/linux/cper.h| 2 ++ 5 files changed, 41 insertions(+) create mode 100644 drivers/firmware/efi/cper-x86.c diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig index 3098410abad8..781a4a337557 100644 --- a/drivers/firmware/efi/Kconfig +++ b/drivers/firmware/efi/Kconfig @@ -174,6 +174,11 @@ config UEFI_CPER_ARM depends on UEFI_CPER && ( ARM || ARM64 ) default y +config UEFI_CPER_X86 + bool + depends on UEFI_CPER && X86 + default y + config EFI_DEV_PATH_PARSER bool depends on ACPI diff --git a/drivers/firmware/efi/Makefile b/drivers/firmware/efi/Makefile index cb805374f4bc..5f9f5039de50 100644 --- a/drivers/firmware/efi/Makefile +++ b/drivers/firmware/efi/Makefile @@ -31,3 +31,4 @@ obj-$(CONFIG_ARM) += $(arm-obj-y) obj-$(CONFIG_ARM64)+= $(arm-obj-y) obj-$(CONFIG_EFI_CAPSULE_LOADER) += capsule-loader.o obj-$(CONFIG_UEFI_CPER_ARM)+= cper-arm.o +obj-$(CONFIG_UEFI_CPER_X86)+= cper-x86.o diff --git a/drivers/firmware/efi/cper-x86.c b/drivers/firmware/efi/cper-x86.c new file mode 100644 index ..863f0cd2a0ff --- /dev/null +++ b/drivers/firmware/efi/cper-x86.c @@ -0,0 +1,23 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (C) 2018, Advanced Micro Devices, Inc. + +#include + +/* + * We don't need a "CPER_IA" prefix since these are all locally defined. + * This will save us a lot of line space. + */ +#define VALID_LAPIC_ID BIT_ULL(0) +#define VALID_CPUID_INFO BIT_ULL(1) + +void cper_print_proc_ia(const char *pfx, const struct cper_sec_proc_ia *proc) +{ + if (proc->validation_bits & VALID_LAPIC_ID) + printk("%sLocal APIC_ID: 0x%llx\n", pfx, proc->lapic_id); + + if (proc->validation_bits & VALID_CPUID_INFO) { + printk("%sCPUID Info:\n", pfx); + print_hex_dump(pfx, "", DUMP_PREFIX_OFFSET, 16, 4, proc->cpuid, + sizeof(proc->cpuid), 0); + } +} diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c index c165933ebf38..5a59b582c9aa 100644 --- a/drivers/firmware/efi/cper.c +++ b/drivers/firmware/efi/cper.c @@ -469,6 +469,16 @@ cper_estatus_print_section(const char *pfx, struct acpi_hest_generic_data *gdata cper_print_proc_arm(newpfx, arm_err); else goto err_section_too_small; +#endif +#if defined(CONFIG_UEFI_CPER_X86) + } else if (guid_equal(sec_type, &CPER_SEC_PROC_IA)) { + struct cper_sec_proc_ia *ia_err = acpi_hest_get_payload(gdata); + + printk("%ssection_type: IA32/X64 processor error\n", newpfx); + if (gdata->error_data_length >= sizeof(*ia_err)) + cper_print_proc_ia(newpfx, ia_err); + else + goto err_section_too_small; #endif } else { const void *err = acpi_hest_get_payload(gdata); diff --git a/include/linux/cper.h b/include/linux/cper.h index 4b5f8459b403..9c703a0abe6e 100644 --- a/include/linux/cper.h +++ b/include/linux/cper.h @@ -551,5 +551,7 @@ const char *cper_mem_err_unpack(struct trace_seq *, struct cper_mem_err_compact *); void cper_print_proc_arm(const char *pfx, const struct cper_sec_proc_arm *proc); +void cper_print_proc_ia(const char *pfx, + const struct cper_sec_proc_ia *proc); #endif -- 2.14.1
[PATCH v4 4/8] efi: Decode UEFI-defined IA32/X64 Error Structure GUIDs
From: Yazen Ghannam For easier handling, match the known IA32/X64 error structure GUIDs to enums. Also, print out the name of the matching Error Structure Type. Only print the GUID for unknown types. GUIDs taken from UEFI 2.7 section N.2.4.2.1 IA32/X64 Processor Error Information Structure. Signed-off-by: Yazen Ghannam --- Link: https://lkml.kernel.org/r/20180324184940.19762-5-yazen.ghan...@amd.com v3->v4: * No changes. v2->v3: * Only print raw GUID for unknown error types. v1->v2: * Change use of enum type to u8. * Fix indentation on multi-line statements. drivers/firmware/efi/cper-x86.c | 47 +++-- 1 file changed, 45 insertions(+), 2 deletions(-) diff --git a/drivers/firmware/efi/cper-x86.c b/drivers/firmware/efi/cper-x86.c index e0633a103fcf..5438097b93ac 100644 --- a/drivers/firmware/efi/cper-x86.c +++ b/drivers/firmware/efi/cper-x86.c @@ -11,17 +11,53 @@ #define VALID_CPUID_INFO BIT_ULL(1) #define VALID_PROC_ERR_INFO_NUM(bits) (((bits) & GENMASK_ULL(7, 2)) >> 2) +#define INFO_ERR_STRUCT_TYPE_CACHE \ + GUID_INIT(0xA55701F5, 0xE3EF, 0x43DE, 0xAC, 0x72, 0x24, 0x9B, \ + 0x57, 0x3F, 0xAD, 0x2C) +#define INFO_ERR_STRUCT_TYPE_TLB \ + GUID_INIT(0xFC06B535, 0x5E1F, 0x4562, 0x9F, 0x25, 0x0A, 0x3B, \ + 0x9A, 0xDB, 0x63, 0xC3) +#define INFO_ERR_STRUCT_TYPE_BUS \ + GUID_INIT(0x1CF3F8B3, 0xC5B1, 0x49a2, 0xAA, 0x59, 0x5E, 0xEF, \ + 0x92, 0xFF, 0xA6, 0x3C) +#define INFO_ERR_STRUCT_TYPE_MS \ + GUID_INIT(0x48AB7F57, 0xDC34, 0x4f6c, 0xA7, 0xD3, 0xB0, 0xB5, \ + 0xB0, 0xA7, 0x43, 0x14) + #define INFO_VALID_CHECK_INFO BIT_ULL(0) #define INFO_VALID_TARGET_ID BIT_ULL(1) #define INFO_VALID_REQUESTOR_IDBIT_ULL(2) #define INFO_VALID_RESPONDER_IDBIT_ULL(3) #define INFO_VALID_IP BIT_ULL(4) +enum err_types { + ERR_TYPE_CACHE = 0, + ERR_TYPE_TLB, + ERR_TYPE_BUS, + ERR_TYPE_MS, + N_ERR_TYPES +}; + +static enum err_types cper_get_err_type(const guid_t *err_type) +{ + if (guid_equal(err_type, &INFO_ERR_STRUCT_TYPE_CACHE)) + return ERR_TYPE_CACHE; + else if (guid_equal(err_type, &INFO_ERR_STRUCT_TYPE_TLB)) + return ERR_TYPE_TLB; + else if (guid_equal(err_type, &INFO_ERR_STRUCT_TYPE_BUS)) + return ERR_TYPE_BUS; + else if (guid_equal(err_type, &INFO_ERR_STRUCT_TYPE_MS)) + return ERR_TYPE_MS; + else + return N_ERR_TYPES; +} + void cper_print_proc_ia(const char *pfx, const struct cper_sec_proc_ia *proc) { int i; struct cper_ia_err_info *err_info; char newpfx[64]; + u8 err_type; if (proc->validation_bits & VALID_LAPIC_ID) printk("%sLocal APIC_ID: 0x%llx\n", pfx, proc->lapic_id); @@ -38,8 +74,15 @@ void cper_print_proc_ia(const char *pfx, const struct cper_sec_proc_ia *proc) for (i = 0; i < VALID_PROC_ERR_INFO_NUM(proc->validation_bits); i++) { printk("%sError Information Structure %d:\n", pfx, i); - printk("%sError Structure Type: %pUl\n", newpfx, - &err_info->err_type); + err_type = cper_get_err_type(&err_info->err_type); + printk("%sError Structure Type: %s\n", newpfx, + err_type < ARRAY_SIZE(cper_proc_error_type_strs) ? + cper_proc_error_type_strs[err_type] : "unknown"); + + if (err_type >= N_ERR_TYPES) { + printk("%sError Structure Type: %pUl\n", newpfx, + &err_info->err_type); + } if (err_info->validation_bits & INFO_VALID_CHECK_INFO) { printk("%sCheck Information: 0x%016llx\n", newpfx, -- 2.14.1
[PATCH v4 6/8] efi: Decode additional IA32/X64 Bus Check fields
From: Yazen Ghannam The "Participation Type", "Time Out", and "Address Space" fields are unique to the IA32/X64 Bus Check structure. Print these fields. Based on UEFI 2.7 Table 256. IA32/X64 Bus Check Structure Signed-off-by: Yazen Ghannam --- Link: https://lkml.kernel.org/r/20180324184940.19762-7-yazen.ghan...@amd.com v3->v4: * No changes. v2->v3: * Fix table number in commit message. v1->v2: * Add parantheses around "check" expression in macro. * Fix indentation on multi-line statements. drivers/firmware/efi/cper-x86.c | 44 + 1 file changed, 44 insertions(+) diff --git a/drivers/firmware/efi/cper-x86.c b/drivers/firmware/efi/cper-x86.c index f70c46f7a4db..5e6716564dba 100644 --- a/drivers/firmware/efi/cper-x86.c +++ b/drivers/firmware/efi/cper-x86.c @@ -39,6 +39,10 @@ #define CHECK_VALID_RESTARTABLE_IP BIT_ULL(6) #define CHECK_VALID_OVERFLOW BIT_ULL(7) +#define CHECK_VALID_BUS_PART_TYPE BIT_ULL(8) +#define CHECK_VALID_BUS_TIME_OUT BIT_ULL(9) +#define CHECK_VALID_BUS_ADDR_SPACE BIT_ULL(10) + #define CHECK_VALID_BITS(check)(((check) & GENMASK_ULL(15, 0))) #define CHECK_TRANS_TYPE(check)(((check) & GENMASK_ULL(17, 16)) >> 16) #define CHECK_OPERATION(check) (((check) & GENMASK_ULL(21, 18)) >> 18) @@ -49,6 +53,10 @@ #define CHECK_RESTARTABLE_IP BIT_ULL(28) #define CHECK_OVERFLOW BIT_ULL(29) +#define CHECK_BUS_PART_TYPE(check) (((check) & GENMASK_ULL(31, 30)) >> 30) +#define CHECK_BUS_TIME_OUT BIT_ULL(32) +#define CHECK_BUS_ADDR_SPACE(check)(((check) & GENMASK_ULL(34, 33)) >> 33) + enum err_types { ERR_TYPE_CACHE = 0, ERR_TYPE_TLB, @@ -89,6 +97,20 @@ static const char * const ia_check_op_strs[] = { "snoop", }; +static const char * const ia_check_bus_part_type_strs[] = { + "Local Processor originated request", + "Local Processor responded to request", + "Local Processor observed", + "Generic", +}; + +static const char * const ia_check_bus_addr_space_strs[] = { + "Memory Access", + "Reserved", + "I/O", + "Other Transaction", +}; + static inline void print_bool(char *str, const char *pfx, u64 check, u64 bit) { printk("%s%s: %s\n", pfx, str, (check & bit) ? "true" : "false"); @@ -139,6 +161,28 @@ static void print_err_info(const char *pfx, u8 err_type, u64 check) if (validation_bits & CHECK_VALID_OVERFLOW) print_bool("Overflow", pfx, check, CHECK_OVERFLOW); + + if (err_type != ERR_TYPE_BUS) + return; + + if (validation_bits & CHECK_VALID_BUS_PART_TYPE) { + u8 part_type = CHECK_BUS_PART_TYPE(check); + + printk("%sParticipation Type: %u, %s\n", pfx, part_type, + part_type < ARRAY_SIZE(ia_check_bus_part_type_strs) ? + ia_check_bus_part_type_strs[part_type] : "unknown"); + } + + if (validation_bits & CHECK_VALID_BUS_TIME_OUT) + print_bool("Time Out", pfx, check, CHECK_BUS_TIME_OUT); + + if (validation_bits & CHECK_VALID_BUS_ADDR_SPACE) { + u8 addr_space = CHECK_BUS_ADDR_SPACE(check); + + printk("%sAddress Space: %u, %s\n", pfx, addr_space, + addr_space < ARRAY_SIZE(ia_check_bus_addr_space_strs) ? + ia_check_bus_addr_space_strs[addr_space] : "unknown"); + } } void cper_print_proc_ia(const char *pfx, const struct cper_sec_proc_ia *proc) -- 2.14.1
[PATCH v4 8/8] efi: Decode IA32/X64 Context Info structure
From: Yazen Ghannam Print the fields of the IA32/X64 Context Information structure. Print the "Register Array" as raw values. Some context types are defined in the UEFI spec, so more detailed decoded may be added in the future. Based on UEFI 2.7 section N.2.4.2.2 IA32/X64 Processor Context Information Structure. Signed-off-by: Yazen Ghannam --- Link: https://lkml.kernel.org/r/20180324184940.19762-9-yazen.ghan...@amd.com v3->v4: * No changes. v2->v3: * No change. v1->v2: * Add parantheses around "bits" expression in macro. * Change VALID_PROC_CNTXT_INFO_NUM to VALID_PROC_CTX_INFO_NUM. * Fix indentation on multi-line statements. * Remove conditional to skip unknown context types. The context info should be printed even if the type is unknown. This is just like what we do for the error information. drivers/firmware/efi/cper-x86.c | 48 + 1 file changed, 48 insertions(+) diff --git a/drivers/firmware/efi/cper-x86.c b/drivers/firmware/efi/cper-x86.c index 356b8d326219..2531de49f56c 100644 --- a/drivers/firmware/efi/cper-x86.c +++ b/drivers/firmware/efi/cper-x86.c @@ -10,6 +10,7 @@ #define VALID_LAPIC_ID BIT_ULL(0) #define VALID_CPUID_INFO BIT_ULL(1) #define VALID_PROC_ERR_INFO_NUM(bits) (((bits) & GENMASK_ULL(7, 2)) >> 2) +#define VALID_PROC_CXT_INFO_NUM(bits) (((bits) & GENMASK_ULL(13, 8)) >> 8) #define INFO_ERR_STRUCT_TYPE_CACHE \ GUID_INIT(0xA55701F5, 0xE3EF, 0x43DE, 0xAC, 0x72, 0x24, 0x9B, \ @@ -71,6 +72,9 @@ #define CHECK_MS_RESTARTABLE_IPBIT_ULL(22) #define CHECK_MS_OVERFLOW BIT_ULL(23) +#define CTX_TYPE_MSR 1 +#define CTX_TYPE_MMREG 7 + enum err_types { ERR_TYPE_CACHE = 0, ERR_TYPE_TLB, @@ -134,6 +138,17 @@ static const char * const ia_check_ms_error_type_strs[] = { "Internal Unclassified", }; +static const char * const ia_reg_ctx_strs[] = { + "Unclassified Data", + "MSR Registers (Machine Check and other MSRs)", + "32-bit Mode Execution Context", + "64-bit Mode Execution Context", + "FXSAVE Context", + "32-bit Mode Debug Registers (DR0-DR7)", + "64-bit Mode Debug Registers (DR0-DR7)", + "Memory Mapped Registers", +}; + static inline void print_bool(char *str, const char *pfx, u64 check, u64 bit) { printk("%s%s: %s\n", pfx, str, (check & bit) ? "true" : "false"); @@ -242,6 +257,7 @@ void cper_print_proc_ia(const char *pfx, const struct cper_sec_proc_ia *proc) { int i; struct cper_ia_err_info *err_info; + struct cper_ia_proc_ctx *ctx_info; char newpfx[64], infopfx[64]; u8 err_type; @@ -305,4 +321,36 @@ void cper_print_proc_ia(const char *pfx, const struct cper_sec_proc_ia *proc) err_info++; } + + ctx_info = (struct cper_ia_proc_ctx *)err_info; + for (i = 0; i < VALID_PROC_CXT_INFO_NUM(proc->validation_bits); i++) { + int size = sizeof(*ctx_info) + ctx_info->reg_arr_size; + int groupsize = 4; + + printk("%sContext Information Structure %d:\n", pfx, i); + + printk("%sRegister Context Type: %s\n", newpfx, + ctx_info->reg_ctx_type < ARRAY_SIZE(ia_reg_ctx_strs) ? + ia_reg_ctx_strs[ctx_info->reg_ctx_type] : "unknown"); + + printk("%sRegister Array Size: 0x%04x\n", newpfx, + ctx_info->reg_arr_size); + + if (ctx_info->reg_ctx_type == CTX_TYPE_MSR) { + groupsize = 8; /* MSRs are 8 bytes wide. */ + printk("%sMSR Address: 0x%08x\n", newpfx, + ctx_info->msr_addr); + } + + if (ctx_info->reg_ctx_type == CTX_TYPE_MMREG) { + printk("%sMM Register Address: 0x%016llx\n", newpfx, + ctx_info->mm_reg_addr); + } + + printk("%sRegister Array:\n", newpfx); + print_hex_dump(newpfx, "", DUMP_PREFIX_OFFSET, 16, groupsize, + (ctx_info + 1), ctx_info->reg_arr_size, 0); + + ctx_info = (struct cper_ia_proc_ctx *)((long)ctx_info + size); + } } -- 2.14.1
[PATCH v4 7/8] efi: Decode IA32/X64 MS Check structure
From: Yazen Ghannam The IA32/X64 MS Check structure varies from the other Check structures in the the bit positions of its fields, and it includes an additional "Error Type" field. Decode the MS Check structure in a separate function. Based on UEFI 2.7 Table 257. IA32/X64 MS Check Field Description. Signed-off-by: Yazen Ghannam --- Link: https://lkml.kernel.org/r/20180324184940.19762-8-yazen.ghan...@amd.com v3->v4: * No changes. v2->v3: * Fix table number in commit message. v1->v2: * Add parantheses around "check" expression in macro. * Fix indentation on multi-line statements. drivers/firmware/efi/cper-x86.c | 55 - 1 file changed, 54 insertions(+), 1 deletion(-) diff --git a/drivers/firmware/efi/cper-x86.c b/drivers/firmware/efi/cper-x86.c index 5e6716564dba..356b8d326219 100644 --- a/drivers/firmware/efi/cper-x86.c +++ b/drivers/firmware/efi/cper-x86.c @@ -57,6 +57,20 @@ #define CHECK_BUS_TIME_OUT BIT_ULL(32) #define CHECK_BUS_ADDR_SPACE(check)(((check) & GENMASK_ULL(34, 33)) >> 33) +#define CHECK_VALID_MS_ERR_TYPEBIT_ULL(0) +#define CHECK_VALID_MS_PCC BIT_ULL(1) +#define CHECK_VALID_MS_UNCORRECTED BIT_ULL(2) +#define CHECK_VALID_MS_PRECISE_IP BIT_ULL(3) +#define CHECK_VALID_MS_RESTARTABLE_IP BIT_ULL(4) +#define CHECK_VALID_MS_OVERFLOWBIT_ULL(5) + +#define CHECK_MS_ERR_TYPE(check) (((check) & GENMASK_ULL(18, 16)) >> 16) +#define CHECK_MS_PCC BIT_ULL(19) +#define CHECK_MS_UNCORRECTED BIT_ULL(20) +#define CHECK_MS_PRECISE_IPBIT_ULL(21) +#define CHECK_MS_RESTARTABLE_IPBIT_ULL(22) +#define CHECK_MS_OVERFLOW BIT_ULL(23) + enum err_types { ERR_TYPE_CACHE = 0, ERR_TYPE_TLB, @@ -111,17 +125,56 @@ static const char * const ia_check_bus_addr_space_strs[] = { "Other Transaction", }; +static const char * const ia_check_ms_error_type_strs[] = { + "No Error", + "Unclassified", + "Microcode ROM Parity Error", + "External Error", + "FRC Error", + "Internal Unclassified", +}; + static inline void print_bool(char *str, const char *pfx, u64 check, u64 bit) { printk("%s%s: %s\n", pfx, str, (check & bit) ? "true" : "false"); } +static void print_err_info_ms(const char *pfx, u16 validation_bits, u64 check) +{ + if (validation_bits & CHECK_VALID_MS_ERR_TYPE) { + u8 err_type = CHECK_MS_ERR_TYPE(check); + + printk("%sError Type: %u, %s\n", pfx, err_type, + err_type < ARRAY_SIZE(ia_check_ms_error_type_strs) ? + ia_check_ms_error_type_strs[err_type] : "unknown"); + } + + if (validation_bits & CHECK_VALID_MS_PCC) + print_bool("Processor Context Corrupt", pfx, check, CHECK_MS_PCC); + + if (validation_bits & CHECK_VALID_MS_UNCORRECTED) + print_bool("Uncorrected", pfx, check, CHECK_MS_UNCORRECTED); + + if (validation_bits & CHECK_VALID_MS_PRECISE_IP) + print_bool("Precise IP", pfx, check, CHECK_MS_PRECISE_IP); + + if (validation_bits & CHECK_VALID_MS_RESTARTABLE_IP) + print_bool("Restartable IP", pfx, check, CHECK_MS_RESTARTABLE_IP); + + if (validation_bits & CHECK_VALID_MS_OVERFLOW) + print_bool("Overflow", pfx, check, CHECK_MS_OVERFLOW); +} + static void print_err_info(const char *pfx, u8 err_type, u64 check) { u16 validation_bits = CHECK_VALID_BITS(check); + /* +* The MS Check structure varies a lot from the others, so use a +* separate function for decoding. +*/ if (err_type == ERR_TYPE_MS) - return; + return print_err_info_ms(pfx, validation_bits, check); if (validation_bits & CHECK_VALID_TRANS_TYPE) { u8 trans_type = CHECK_TRANS_TYPE(check); -- 2.14.1
[PATCH v4 5/8] efi: Decode IA32/X64 Cache, TLB, and Bus Check structures
From: Yazen Ghannam Print the common fields of the Cache, TLB, and Bus check structures.The fields of these three check types are the same except for a few more fields in the Bus check structure. The remaining Bus check structure fields will be decoded in a following patch. Based on UEFI 2.7, Table 254. IA32/X64 Cache Check Structure Table 255. IA32/X64 TLB Check Structure Table 256. IA32/X64 Bus Check Structure Signed-off-by: Yazen Ghannam --- Link: https://lkml.kernel.org/r/20180324184940.19762-6-yazen.ghan...@amd.com v3->v4: * Drop INDENT_SP use. v2->v3: * Fix table numbers in commit message. * Don't print raw validation bits. v1->v2: * Add parantheses around "check" expression in macro. * Change use of enum type to u8. * Fix indentation on multi-line statements. drivers/firmware/efi/cper-x86.c | 99 - 1 file changed, 98 insertions(+), 1 deletion(-) diff --git a/drivers/firmware/efi/cper-x86.c b/drivers/firmware/efi/cper-x86.c index 5438097b93ac..f70c46f7a4db 100644 --- a/drivers/firmware/efi/cper-x86.c +++ b/drivers/firmware/efi/cper-x86.c @@ -30,6 +30,25 @@ #define INFO_VALID_RESPONDER_IDBIT_ULL(3) #define INFO_VALID_IP BIT_ULL(4) +#define CHECK_VALID_TRANS_TYPE BIT_ULL(0) +#define CHECK_VALID_OPERATION BIT_ULL(1) +#define CHECK_VALID_LEVEL BIT_ULL(2) +#define CHECK_VALID_PCCBIT_ULL(3) +#define CHECK_VALID_UNCORRECTEDBIT_ULL(4) +#define CHECK_VALID_PRECISE_IP BIT_ULL(5) +#define CHECK_VALID_RESTARTABLE_IP BIT_ULL(6) +#define CHECK_VALID_OVERFLOW BIT_ULL(7) + +#define CHECK_VALID_BITS(check)(((check) & GENMASK_ULL(15, 0))) +#define CHECK_TRANS_TYPE(check)(((check) & GENMASK_ULL(17, 16)) >> 16) +#define CHECK_OPERATION(check) (((check) & GENMASK_ULL(21, 18)) >> 18) +#define CHECK_LEVEL(check) (((check) & GENMASK_ULL(24, 22)) >> 22) +#define CHECK_PCC BIT_ULL(25) +#define CHECK_UNCORRECTED BIT_ULL(26) +#define CHECK_PRECISE_IP BIT_ULL(27) +#define CHECK_RESTARTABLE_IP BIT_ULL(28) +#define CHECK_OVERFLOW BIT_ULL(29) + enum err_types { ERR_TYPE_CACHE = 0, ERR_TYPE_TLB, @@ -52,11 +71,81 @@ static enum err_types cper_get_err_type(const guid_t *err_type) return N_ERR_TYPES; } +static const char * const ia_check_trans_type_strs[] = { + "Instruction", + "Data Access", + "Generic", +}; + +static const char * const ia_check_op_strs[] = { + "generic error", + "generic read", + "generic write", + "data read", + "data write", + "instruction fetch", + "prefetch", + "eviction", + "snoop", +}; + +static inline void print_bool(char *str, const char *pfx, u64 check, u64 bit) +{ + printk("%s%s: %s\n", pfx, str, (check & bit) ? "true" : "false"); +} + +static void print_err_info(const char *pfx, u8 err_type, u64 check) +{ + u16 validation_bits = CHECK_VALID_BITS(check); + + if (err_type == ERR_TYPE_MS) + return; + + if (validation_bits & CHECK_VALID_TRANS_TYPE) { + u8 trans_type = CHECK_TRANS_TYPE(check); + + printk("%sTransaction Type: %u, %s\n", pfx, trans_type, + trans_type < ARRAY_SIZE(ia_check_trans_type_strs) ? + ia_check_trans_type_strs[trans_type] : "unknown"); + } + + if (validation_bits & CHECK_VALID_OPERATION) { + u8 op = CHECK_OPERATION(check); + + /* +* CACHE has more operation types than TLB or BUS, though the +* name and the order are the same. +*/ + u8 max_ops = (err_type == ERR_TYPE_CACHE) ? 9 : 7; + + printk("%sOperation: %u, %s\n", pfx, op, + op < max_ops ? ia_check_op_strs[op] : "unknown"); + } + + if (validation_bits & CHECK_VALID_LEVEL) + printk("%sLevel: %llu\n", pfx, CHECK_LEVEL(check)); + + if (validation_bits & CHECK_VALID_PCC) + print_bool("Processor Context Corrupt", pfx, check, CHECK_PCC); + + if (validation_bits & CHECK_VALID_UNCORRECTED) + print_bool("Uncorrected", pfx, check, CHECK_UNCORRECTED); + + if (validation_bits & CHECK_VALID_PRECISE_IP) + print_bool("Precise IP", pfx, check, CHECK_PRECISE_IP); + + if (validation_bits & CHECK_VALID_RESTARTABLE_IP) + print_bool("Restartable IP", pfx, check, CHECK_RESTA
[PATCH v4 1/8] efi: Fix IA32/X64 Processor Error Record definition
From: Yazen Ghannam Based on UEFI 2.7 Table 255. Processor Error Record, the "Local APIC_ID" field is 8 bytes but Linux defines this field as 1 byte. Fix this in the struct cper_sec_proc_ia definition. Signed-off-by: Yazen Ghannam --- Link: https://lkml.kernel.org/r/20180324184940.19762-2-yazen.ghan...@amd.com v3->v4: * No changes. v2->v3: * Fix table number in commit message. v1->v2: * No changes include/linux/cper.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/cper.h b/include/linux/cper.h index d14ef4e77c8a..4b5f8459b403 100644 --- a/include/linux/cper.h +++ b/include/linux/cper.h @@ -381,7 +381,7 @@ struct cper_sec_proc_generic { /* IA32/X64 Processor Error Section */ struct cper_sec_proc_ia { __u64 validation_bits; - __u8lapic_id; + __u64 lapic_id; __u8cpuid[48]; }; -- 2.14.1
[PATCH] x86/smpboot: Don't do mwait_play_dead() on AMD systems
From: Yazen Ghannam Recent AMD systems support using MWAIT for C1 state. However, MWAIT will not allow deeper cstates than C1 on current systems. With play_dead() we expect the OS to use the deepest state available. The deepest state available on AMD systems is reached through SystemIO or HALT. If MWAIT is available, we use it instead of the other methods, so we never reach the deepest state. Don't try to use MWAIT to play_dead() on AMD systems. Instead, we'll use CPUIDLE to enter the deepest state advertised by firmware. If CPUIDLE is not available then we fallback to HALT. Signed-off-by: Yazen Ghannam --- arch/x86/kernel/smpboot.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index ff99e2b6fc54..67cf00b25f83 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1536,6 +1536,9 @@ static inline void mwait_play_dead(void) void *mwait_ptr; int i; + /* Don't try native MWAIT on AMD. Stick to CPUIDLE and HALT. */ + if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) + return; if (!this_cpu_has(X86_FEATURE_MWAIT)) return; if (!this_cpu_has(X86_FEATURE_CLFLUSH)) -- 2.14.1
[PATCH] x86/MCE, EDAC/mce_amd: Save all aux registers on SMCA systems
From: Yazen Ghannam The Intel SDM and AMD APM both state that the auxiliary MCA registers should be read if their respective valid bits are set in MCA_STATUS. The Processor Programming Reference for AMD Fam17h systems has a new recommendation that the auxiliary registers should be saved unconditionally. This recommendation can be retroactively applied to older AMD systems. However, we only need to apply this to SMCA systems to avoid modifying behavior on older systems. Define a separate function to save all auxiliary registers on SMCA systems. Call this function from both the MCE handlers and the AMD LVT interrupt handlers so that we don't duplicate code. Print all auxiliary registers in EDAC/mce_amd. Don't restrict this to SMCA systems in order to save a conditional and keep the format similar between SMCA and non-SMCA systems. Signed-off-by: Yazen Ghannam --- Links: https://lkml.kernel.org/r/20180326191526.64314-1-yazen.ghan...@amd.com https://lkml.kernel.org/r/20180326191526.64314-2-yazen.ghan...@amd.com arch/x86/kernel/cpu/mcheck/mce-internal.h | 6 +++ arch/x86/kernel/cpu/mcheck/mce.c | 20 ++ arch/x86/kernel/cpu/mcheck/mce_amd.c | 65 +-- drivers/edac/mce_amd.c| 12 ++ 4 files changed, 57 insertions(+), 46 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce-internal.h b/arch/x86/kernel/cpu/mcheck/mce-internal.h index 374d1aa66952..67a2c7c095ca 100644 --- a/arch/x86/kernel/cpu/mcheck/mce-internal.h +++ b/arch/x86/kernel/cpu/mcheck/mce-internal.h @@ -59,6 +59,12 @@ static inline void mce_intel_hcpu_update(unsigned long cpu) { } static inline void cmci_disable_bank(int bank) { } #endif +#ifdef CONFIG_X86_MCE_AMD +bool smca_read_aux(struct mce *m, int bank); +#else +static inline bool smca_read_aux(struct mce *m, int bank) { return false; } +#endif + void mce_timer_kick(unsigned long interval); #ifdef CONFIG_ACPI_APEI diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index 42cf2880d0ed..6be63e9e067d 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -639,6 +639,9 @@ static struct notifier_block mce_default_nb = { */ static void mce_read_aux(struct mce *m, int i) { + if (smca_read_aux(m, i)) + return; + if (m->status & MCI_STATUS_MISCV) m->misc = mce_rdmsrl(msr_ops.misc(i)); @@ -653,23 +656,6 @@ static void mce_read_aux(struct mce *m, int i) m->addr >>= shift; m->addr <<= shift; } - - /* -* Extract [55:] where lsb is the least significant -* *valid* bit of the address bits. -*/ - if (mce_flags.smca) { - u8 lsb = (m->addr >> 56) & 0x3f; - - m->addr &= GENMASK_ULL(55, lsb); - } - } - - if (mce_flags.smca) { - m->ipid = mce_rdmsrl(MSR_AMD64_SMCA_MCx_IPID(i)); - - if (m->status & MCI_STATUS_SYNDV) - m->synd = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND(i)); } } diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c index f7666eef4a87..b00d5fff1848 100644 --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c @@ -244,6 +244,47 @@ static void smca_configure(unsigned int bank, unsigned int cpu) } } + +static bool _smca_read_aux(struct mce *m, int bank, bool read_addr) +{ + if (!mce_flags.smca) + return false; + + rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m->ipid); + rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m->synd); + + /* +* We should already have a value if we're coming from the Threshold LVT +* interrupt handler. Otherwise, read it now. +*/ + if (!m->misc) + rdmsrl(msr_ops.misc(bank), m->misc); + + /* +* Read MCA_ADDR if we don't have it already. We should already have it +* if we're coming from the interrupt handlers. +*/ + if (read_addr) + rdmsrl(msr_ops.addr(bank), m->addr); + + /* +* Extract [55:] where lsb is the least significant +* *valid* bit of the address bits. +*/ + if (m->addr) { + u8 lsb = (m->addr >> 56) & 0x3f; + + m->addr &= GENMASK_ULL(55, lsb); + } + + return true; +} + +bool smca_read_aux(struct mce *m, int bank) +{ + return _smca_read_aux(m, bank, true); +} + struct thresh_restart { struct threshold_block *b; int reset; @@ -799,30 +840,12 @@ static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc) mce_setup(&m); m.status = statu
Re: [PATCH] x86/AMD: Fix LLC ID for AMD Fam17h systems
>> +/* >> + * LLC is at the Core Complex level. >> + * Core Complex Id is ApicId[3]. >> + */ >> +else if (c->x86 == 0x17) >> +per_cpu(cpu_llc_id, cpu) = c->initial_apicid >> >> 3; > > This whole if/else block lacks curly braces. See: > > https://marc.info/?l=linux-kernel&m=147351236615103 > Okay, understood. >> -/* >> - * Fix percpu cpu_llc_id here as LLC topology is different >> - * for Fam17h systems. >> - */ >> - if (c->x86 != 0x17 || !cpuid_edx(0x8006)) >> -return; >> - >> -socket_id = (c->apicid >> bits) - 1; >> -core_complex_id = (c->apicid & ((1 << bits) - 1)) >> 3; >> - >> -per_cpu(cpu_llc_id, cpu) = (socket_id << 3) | core_complex_id; > > So if I've read the patch correctly then the trivial fix for fam17h would > have been: > >> +per_cpu(cpu_llc_id, cpu) = c->initial_apicid >> 3; > > Right? > Right. > And this one liner wants to be a seperate patch with a proper > explanation. And that simple hunk can be tagged for stable. > Okay, I'll make it a separate patch. It still wouldn't be a one-liner because then socket_id and core_complex_id would be left unused and should be removed. > The rest of the patch is cleanup and improvement and want's to be seperated > out and explained proper. > Okay, will do. Thanks, Yazen
[PATCH v2 2/2] x86/AMD: Group cpu_llc_id assignment by topology feature and family
Currently, we assume that a system has multiple last level caches only if there are multiple nodes, and that the cpu_llc_id is equal to the node_id. This no longer applies since Fam17h can have multiple last level caches within a node. So group the cpu_llc_id assignment by topology feature and family. The NODEID_MSR feature only applies to Fam10h in which case the llc is at the node level. The TOPOEXT feature is used on families 15h, 16h and 17h. So far we only see multiple last level caches if L3 caches are available. Otherwise, the cpu_llc_id will default to be the phys_proc_id. We have L3 caches only on families 15h and 17h. On Fam15h, the llc is at the node level. On Fam17h, the llc is at the core complex level and can be found by right shifting the apicid. Also, keep the family checks explicit so that new families will fall back to the default. Single node systems in families 10h and 15h will have a Node ID of 0 which will be the same as the phys_proc_id, so we don't need to check for multiple nodes before using the node_id. Signed-off-by: Yazen Ghannam --- arch/x86/kernel/cpu/amd.c | 32 1 file changed, 20 insertions(+), 12 deletions(-) diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index c3fc337..be70345 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -314,11 +314,31 @@ static void amd_get_topology(struct cpuinfo_x86 *c) smp_num_siblings = ((ebx >> 8) & 3) + 1; c->x86_max_cores /= smp_num_siblings; c->cpu_core_id = ebx & 0xff; + + /* +* We may have multiple LLCs if L3 caches exist, so check if we +* have an L3 cache by looking at the L3 cache cpuid leaf. +*/ + if (cpuid_edx(0x8006)) { + if (c->x86 == 0x15) { + /* LLC is at the node level. */ + per_cpu(cpu_llc_id, cpu) = node_id; + + } else if (c->x86 == 0x17) { + /* +* LLC is at the core complex level. +* Core complex id is ApicId[3]. +*/ + per_cpu(cpu_llc_id, cpu) = c->apicid >> 3; + } + } } else if (cpu_has(c, X86_FEATURE_NODEID_MSR)) { u64 value; rdmsrl(MSR_FAM10H_NODE_ID, value); node_id = value & 7; + + per_cpu(cpu_llc_id, cpu) = node_id; } else return; @@ -329,9 +349,6 @@ static void amd_get_topology(struct cpuinfo_x86 *c) set_cpu_cap(c, X86_FEATURE_AMD_DCM); cus_per_node = c->x86_max_cores / nodes_per_socket; - /* store NodeID, use llc_shared_map to store sibling info */ - per_cpu(cpu_llc_id, cpu) = node_id; - /* core id has to be in the [0 .. cores_per_node - 1] range */ c->cpu_core_id %= cus_per_node; } @@ -356,15 +373,6 @@ static void amd_detect_cmp(struct cpuinfo_x86 *c) /* use socket ID also for last level cache */ per_cpu(cpu_llc_id, cpu) = c->phys_proc_id; amd_get_topology(c); - - /* -* Fix percpu cpu_llc_id here as LLC topology is different -* for Fam17h systems. -*/ -if (c->x86 != 0x17 || !cpuid_edx(0x8006)) - return; - - per_cpu(cpu_llc_id, cpu) = c->apicid >> 3; #endif } -- 1.9.1
[PATCH v2 1/2] x86/AMD: Fix cpu_llc_id for AMD Fam17h systems
The current Fam17h cpu_llc_id derivation has an underflow bug when extracting the socket_id value. The socket_id value starts from 0, so subtracting 1 will result in an underflow. This breaks scheduling topology later on since the cpu_llc_id will be incorrect. The apicid decoding is fixed for bits 3 and above, which give the core complex, node and socket IDs. The LLC is at the core complex level so we can find a unique cpu_llc_id by right shifting the apicid by 3. We can fix the underflow bug and simplify the code by replacing the current cpu_llc_id derivation with a right shift. Signed-off-by: Yazen Ghannam Cc: # v4.4.. Fixes: 3849e91f571d ("x86/AMD: Fix last level cache topology for AMD Fam17h systems") --- arch/x86/kernel/cpu/amd.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index 7b76eb6..c3fc337 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -347,7 +347,6 @@ static void amd_detect_cmp(struct cpuinfo_x86 *c) #ifdef CONFIG_SMP unsigned bits; int cpu = smp_processor_id(); - unsigned int socket_id, core_complex_id; bits = c->x86_coreid_bits; /* Low order bits define the core id (index of core in socket) */ @@ -365,10 +364,7 @@ static void amd_detect_cmp(struct cpuinfo_x86 *c) if (c->x86 != 0x17 || !cpuid_edx(0x8006)) return; - socket_id = (c->apicid >> bits) - 1; - core_complex_id = (c->apicid & ((1 << bits) - 1)) >> 3; - - per_cpu(cpu_llc_id, cpu) = (socket_id << 3) | core_complex_id; + per_cpu(cpu_llc_id, cpu) = c->apicid >> 3; #endif } -- 1.9.1
Re: [PATCH -v1.1] x86/topology: Document cpu_llc_id
On Thu, Nov 17, 2016 at 10:45:57AM +0100, Borislav Petkov wrote: > It means different things on Intel and AMD so write it down so that > there's no confusion. > > Signed-off-by: Borislav Petkov > Cc: Peter Zijlstra > Cc: Thomas Gleixner > Cc: Yazen Ghannam > --- > Documentation/x86/topology.txt | 9 + > 1 file changed, 9 insertions(+) > > diff --git a/Documentation/x86/topology.txt b/Documentation/x86/topology.txt > index 06afac252f5b..f3e9d7e9ed6c 100644 > --- a/Documentation/x86/topology.txt > +++ b/Documentation/x86/topology.txt > @@ -63,6 +63,15 @@ The topology of a system is described in the units of: > The maximum possible number of packages in the system. Helpful for per > package facilities to preallocate per package information. > > + - cpu_llc_id: > + > +A per-CPU variable containing: > +- On Intel, the first APIC ID of the list of CPUs sharing the Last Level > +Cache > + > +- On AMD, the Node ID or Core Complex ID containing the Last Level > +Cache. In general, it is a number identifying an LLC uniquely on the > +system. > > * Cores: > Look good to me. Thanks, Yazen
Re: [tip:ras/core] x86/RAS: Simplify SMCA HWID descriptor struct
> static void get_smca_bank_info(unsigned int bank) > { > unsigned int i, hwid_mcatype, cpu = smp_processor_id(); > - struct smca_hwid_mcatype *type; > + struct smca_hwid *s_hwid; > u32 high, instance_id; > - u16 hwid, mcatype; > > /* Collect bank_info using CPU 0 for now. */ > if (cpu) > @@ -162,14 +157,13 @@ static void get_smca_bank_info(unsigned int bank) > return; > } > > - hwid = high & MCI_IPID_HWID; > - mcatype = (high & MCI_IPID_MCATYPE) >> 16; > - hwid_mcatype = HWID_MCATYPE(hwid, mcatype); > + hwid_mcatype = HWID_MCATYPE(high & MCI_IPID_HWID, > + (high & MCI_IPID_MCATYPE) >> 16); > Sorry for catching this late, but it seems this change doesn't compile correctly. This causes the value of hwid_mcatype to be incorrect, so we will never match a bank to its type. I see this with GCC 4.8.5 and 5.4.0. There are no warnings or issues when building or booting just that the behavior is incorrect. Disassembly of above change: db: 8b 45 e0mov-0x20(%rbp),%eax de: 41 89 c4mov%eax,%r12d e1: 25 00 00 ff 0f and$0xfff,%eax e6: 41 c1 ec 10 shr$0x10,%r12d ea: 41 09 c4or %eax,%r12d Disassembly of original code: 286: 8b 45 d0mov-0x30(%rbp),%eax 289: 41 89 c5mov%eax,%r13d 28c: c1 e8 10shr$0x10,%eax 28f: 41 81 e5 ff 0f 00 00and$0xfff,%r13d 296: 41 c1 e5 10 shl$0x10,%r13d 29a: 41 09 c5or %eax,%r13d Adding extra parentheses in HWID_MCATYPE() gives the same assembly as the original code and fixes the behavior. > + hwid_mcatype = HWID_MCATYPE((high & MCI_IPID_HWID)), > + ((high & MCI_IPID_MCATYPE) >> 16)); Thanks, Yazen
Re: [tip:ras/core] x86/RAS: Simplify SMCA HWID descriptor struct
> > > > Argh, the macro should be adding the additional parentheses: > > > > #define HWID_MCATYPE(hwid, mcatype) (((hwid) << 16) | (mcatype)) > > > > That should fix the issue too. > Yep, sure does. > Patch please. Will do. Thanks, Yazen
Re: [PATCH v2 2/2] x86/AMD: Group cpu_llc_id assignment by topology feature and family
> > > The NODEID_MSR feature only applies to Fam10h in which case the llc is at > > s/llc/LLC (Last Level Cache/ > > Let's try to have abbreviations written out in their first mention in the > text. > Okay. > > the node level. > > > > The TOPOEXT feature is used on families 15h, 16h and 17h. So far we only > > see multiple last level caches if L3 caches are available. Otherwise, the > > cpu_llc_id will default to be the phys_proc_id. We have L3 caches only on > > families 15h and 17h. On Fam15h, the llc is at the node level. On Fam17h, > > s/llc/LLC/g > Ack. > > the llc is at the core complex level and can be found by right shifting the > ^^^ > > LLC > Ack. > > + > > + /* > > +* We may have multiple LLCs if L3 caches exist, so check if we > > +* have an L3 cache by looking at the L3 cache cpuid leaf. > > x86 instructions in caps please: CPUID > Ack. > > +*/ > > + if (cpuid_edx(0x8006)) { > > + if (c->x86 == 0x15) { > > + /* LLC is at the node level. */ > > + per_cpu(cpu_llc_id, cpu) = node_id; > > + > > + } else if (c->x86 == 0x17) { > > How about >= ? > This APICID format is only valid for Fam17h. What I'm going for is that we fall back to a sensible default if we don't have a better assignment for a new family. At first I thought that phys_proc_id would be good but now I think node_id is better as a sensible default. I'll make this change in the V3 set. > > Btw, please add for your next submission: > > Tested-by: Borislav Petkov > For both patches? Thanks, Yazen
Re: [PATCH v2 1/2] x86/AMD: Fix cpu_llc_id for AMD Fam17h systems
> > The current Fam17h cpu_llc_id derivation has an underflow bug when > ^ > (Last Level Cache ID) > > Let's write it out the first time. > Ack. > > > > The apicid decoding is fixed for bits 3 and above, > ^ > "is fixed ... in register... " > Ack. > > which give the core > > complex, node and socket IDs. The LLC is at the core complex level so we > > can find a unique cpu_llc_id by right shifting the apicid by 3. > > "... because then the LSBit will be the Core Complex ID." > Ack. > > We can fix the underflow bug and simplify the code by replacing the > > current cpu_llc_id derivation with a right shift. > > > > Signed-off-by: Yazen Ghannam > > Cc: # v4.4.. > > Fixes: 3849e91f571d ("x86/AMD: Fix last level cache topology for AMD Fam17h > > systems") > > --- > > I'll run it soon. > Thanks, Yazen
[PATCH v3 2/2] x86/AMD: Group cpu_llc_id assignment by topology feature and family
Currently, we assume that a system has multiple last level caches only if there are multiple nodes, and that the cpu_llc_id is equal to the node_id. This no longer applies since Fam17h can have multiple last level caches within a node. So group the cpu_llc_id assignment by topology feature and family. The NODEID_MSR feature only applies to Fam10h in which case the LLC (Last Level Cache) is at the node level. The TOPOEXT feature is used on families 15h, 16h and 17h. So far we only see multiple last level caches if L3 caches are available. Otherwise, the cpu_llc_id will default to be the phys_proc_id. We have L3 caches only on families 15h and 17h. On Fam15h, the LLC is at the node level. On Fam17h, the LLC is at the core complex level and can be found by right shifting the apicid. Also, keep the family checks explicit so that new families will fall back to the default, which will be node_id for TOPOEXT systems. Single node systems in families 10h and 15h will have a Node ID of 0 which will be the same as the phys_proc_id, so we don't need to check for multiple nodes before using the node_id. Signed-off-by: Yazen Ghannam --- Link: http://lkml.kernel.org/r/1477669918-56261-2-git-send-email-yazen.ghan...@amd.com v2->v3: * Fixup commit message based on comments. * Make node_id the default cpu_llc_id for TOPOEXT systems. arch/x86/kernel/cpu/amd.c | 31 +++ 1 file changed, 19 insertions(+), 12 deletions(-) diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index 1e81a37..4daad1e 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -314,11 +314,30 @@ static void amd_get_topology(struct cpuinfo_x86 *c) smp_num_siblings = ((ebx >> 8) & 3) + 1; c->x86_max_cores /= smp_num_siblings; c->cpu_core_id = ebx & 0xff; + + /* +* We may have multiple LLCs if L3 caches exist, so check if we +* have an L3 cache by looking at the L3 cache CPUID leaf. +*/ + if (cpuid_edx(0x8006)) { + if (c->x86 == 0x17) { + /* +* LLC is at the core complex level. +* Core complex id is ApicId[3]. +*/ + per_cpu(cpu_llc_id, cpu) = c->apicid >> 3; + } else { + /* LLC is at the node level. */ + per_cpu(cpu_llc_id, cpu) = node_id; + } + } } else if (cpu_has(c, X86_FEATURE_NODEID_MSR)) { u64 value; rdmsrl(MSR_FAM10H_NODE_ID, value); node_id = value & 7; + + per_cpu(cpu_llc_id, cpu) = node_id; } else return; @@ -329,9 +348,6 @@ static void amd_get_topology(struct cpuinfo_x86 *c) set_cpu_cap(c, X86_FEATURE_AMD_DCM); cus_per_node = c->x86_max_cores / nodes_per_socket; - /* store NodeID, use llc_shared_map to store sibling info */ - per_cpu(cpu_llc_id, cpu) = node_id; - /* core id has to be in the [0 .. cores_per_node - 1] range */ c->cpu_core_id %= cus_per_node; } @@ -356,15 +372,6 @@ static void amd_detect_cmp(struct cpuinfo_x86 *c) /* use socket ID also for last level cache */ per_cpu(cpu_llc_id, cpu) = c->phys_proc_id; amd_get_topology(c); - - /* -* Fix percpu cpu_llc_id here as LLC topology is different -* for Fam17h systems. -*/ -if (c->x86 != 0x17 || !cpuid_edx(0x8006)) - return; - - per_cpu(cpu_llc_id, cpu) = c->apicid >> 3; #endif } -- 1.9.1
[PATCH v3 1/2] x86/AMD: Fix cpu_llc_id for AMD Fam17h systems
The current Fam17h cpu_llc_id (Last Level Cache ID) derivation has an underflow bug when extracting the socket_id value. The socket_id value starts from 0, so subtracting 1 will result in an underflow. This breaks scheduling topology later on since the cpu_llc_id will be incorrect. The APICID decoding is fixed, in register, for bits 3 and above, which give the core complex, node and socket IDs. The LLC is at the core complex level so we can find a unique cpu_llc_id by right shifting the APICID by 3 because then the least significant bit will be the Core Complex ID. We can fix the underflow bug and simplify the code by replacing the current cpu_llc_id derivation with a right shift. Signed-off-by: Yazen Ghannam Cc: # v4.4.. Fixes: 3849e91f571d ("x86/AMD: Fix last level cache topology for AMD Fam17h systems") --- Link: http://lkml.kernel.org/r/1477669918-56261-1-git-send-email-yazen.ghan...@amd.com v2->v3: * Fixup commit message based on comments. arch/x86/kernel/cpu/amd.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index b81fe2d..1e81a37 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -347,7 +347,6 @@ static void amd_detect_cmp(struct cpuinfo_x86 *c) #ifdef CONFIG_SMP unsigned bits; int cpu = smp_processor_id(); - unsigned int socket_id, core_complex_id; bits = c->x86_coreid_bits; /* Low order bits define the core id (index of core in socket) */ @@ -365,10 +364,7 @@ static void amd_detect_cmp(struct cpuinfo_x86 *c) if (c->x86 != 0x17 || !cpuid_edx(0x8006)) return; - socket_id = (c->apicid >> bits) - 1; - core_complex_id = (c->apicid & ((1 << bits) - 1)) >> 3; - - per_cpu(cpu_llc_id, cpu) = (socket_id << 3) | core_complex_id; + per_cpu(cpu_llc_id, cpu) = c->apicid >> 3; #endif } -- 1.9.1
[PATCH] x86/AMD: Fix LLC ID for AMD Fam17h systems
Fix an underflow bug with the current Fam17h LLC ID derivation by simplifying the derivation, and also move it into amd_get_topology(). Signed-off-by: Yazen Ghannam Cc: sta...@vger.kernel.org # v4.6.. Fixes: 3849e91f571d ("x86/AMD: Fix last level cache topology for AMD Fam17h systems") --- arch/x86/kernel/cpu/amd.c | 35 +++ 1 file changed, 19 insertions(+), 16 deletions(-) diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index b81fe2d..babb93a 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -314,11 +314,30 @@ static void amd_get_topology(struct cpuinfo_x86 *c) smp_num_siblings = ((ebx >> 8) & 3) + 1; c->x86_max_cores /= smp_num_siblings; c->cpu_core_id = ebx & 0xff; + + /* +* We may have multiple LLCs if L3 Caches exist, so check if we +* have an L3 cache by looking at the L3 cache cpuid leaf. +*/ + if (cpuid_edx(0x8006)) { + /* LLC is at the Node level. */ + if (c->x86 == 0x15) + per_cpu(cpu_llc_id, cpu) = node_id; + + /* +* LLC is at the Core Complex level. +* Core Complex Id is ApicId[3]. +*/ + else if (c->x86 == 0x17) + per_cpu(cpu_llc_id, cpu) = c->initial_apicid >> 3; + } } else if (cpu_has(c, X86_FEATURE_NODEID_MSR)) { u64 value; rdmsrl(MSR_FAM10H_NODE_ID, value); node_id = value & 7; + + per_cpu(cpu_llc_id, cpu) = node_id; } else return; @@ -329,9 +348,6 @@ static void amd_get_topology(struct cpuinfo_x86 *c) set_cpu_cap(c, X86_FEATURE_AMD_DCM); cus_per_node = c->x86_max_cores / nodes_per_socket; - /* store NodeID, use llc_shared_map to store sibling info */ - per_cpu(cpu_llc_id, cpu) = node_id; - /* core id has to be in the [0 .. cores_per_node - 1] range */ c->cpu_core_id %= cus_per_node; } @@ -347,7 +363,6 @@ static void amd_detect_cmp(struct cpuinfo_x86 *c) #ifdef CONFIG_SMP unsigned bits; int cpu = smp_processor_id(); - unsigned int socket_id, core_complex_id; bits = c->x86_coreid_bits; /* Low order bits define the core id (index of core in socket) */ @@ -357,18 +372,6 @@ static void amd_detect_cmp(struct cpuinfo_x86 *c) /* use socket ID also for last level cache */ per_cpu(cpu_llc_id, cpu) = c->phys_proc_id; amd_get_topology(c); - - /* -* Fix percpu cpu_llc_id here as LLC topology is different -* for Fam17h systems. -*/ -if (c->x86 != 0x17 || !cpuid_edx(0x8006)) - return; - - socket_id = (c->apicid >> bits) - 1; - core_complex_id = (c->apicid & ((1 << bits) - 1)) >> 3; - - per_cpu(cpu_llc_id, cpu) = (socket_id << 3) | core_complex_id; #endif } -- 1.9.1
Re: linux-next: manual merge of the edac-amd tree with the edac tree
On Thu, Dec 01, 2016 at 10:06:17AM -0200, Mauro Carvalho Chehab wrote: > > However, rebasing over your tree showed a new documentation gap: > ./include/linux/edac.h:144: warning: Enum value 'HW_EVENT_ERR_DEFERRED' > not described in enum 'hw_event_mc_err_type' > > With was introduced by this commit: > > commit d12a969ebbfcfc25853c4147d42b388f758e8784 > Author: Yazen Ghannam > Date: Thu Nov 17 17:57:32 2016 -0500 > > EDAC, amd64: Add Deferred Error type > > Currently, deferred errors are classified as correctable in EDAC. Add a > new error type for deferred errors so that they are correctly reported > to the user. > > Signed-off-by: Yazen Ghannam > Cc: Aravind Gopalakrishnan > Cc: linux-edac > Link: > http://lkml.kernel.org/r/1479423463-8536-7-git-send-email-yazen.ghan...@amd.com > Signed-off-by: Borislav Petkov > > > Yazen introduced a "deferred error" code (whatever it means), but didn't > document what's that. Unfortunately, the patch description is also > not clear enough about what a "deferred error" means or how userspace > is supposed to handle it. > > Yazen, > > Could you please send us a patch adding a proper description for this > new error code? > Hi Mauro, A deferred error is an uncorrectable error whose handling can be deferred, i.e. it's not urgent. This affects the system behavior, but I'm now thinking that this shouldn't affect users' behavior. I think it would be simpler to just classify deferred errors as uncorrectable errors so that users treat them as such. Boris, Can we drop or revert commit d12a969ebbfc? And can we apply a fixup like this to commit 713ad54675fd? --- From: Yazen Ghannam Date: Thu, 1 Dec 2016 08:54:49 -0600 Subject: [PATCH] fixup! EDAC, amd64: Define and register UMC error decode function Signed-off-by: Yazen Ghannam --- drivers/edac/amd64_edac.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index 991b36c..245b9a0 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -2480,8 +2480,9 @@ static void decode_umc_error(int node_id, struct mce *m) memset(&err, 0, sizeof(err)); + /* Log deferred errors as uncorrectable errors. */ if (m->status & MCI_STATUS_DEFERRED) - ecc_type = 3; + ecc_type = 1; err.channel = find_umc_channel(pvt, m); if (err.channel < 0) { -- 2.7.4 --- Thanks, Yazen
Re: linux-next: manual merge of the edac-amd tree with the edac tree
On Thu, Dec 01, 2016 at 07:15:01PM +0100, Borislav Petkov wrote: > On Thu, Dec 01, 2016 at 11:02:04AM -0500, Yazen Ghannam wrote: > > A deferred error is an uncorrectable error whose handling can be > > deferred, i.e. it's not urgent. This affects the system behavior, but > > I'm now thinking that this shouldn't affect users' behavior. I think it > > would be simpler to just classify deferred errors as uncorrectable > > errors so that users treat them as such. > > Why would we want to lie about deferred errors being uncorrectable? > They are uncorrectable errors that can be handled differently. If you can't handle them then there's not much difference. > And I believe deferred errors can be handled differently like freeze the > process using the page instead of killing it. And so on... > If deferred errors can be handled differently in userspace, then you're right we should maintain the distinction. I was thinking we'd only handle them in the kernel. > Why aren't you simply adding the documentation about > HW_EVENT_ERR_DEFERRED and be done with it? The downstream path like > tracepoint and all can handle all that just fine. > Okay, will do. > > Boris, > > Can we drop or revert commit d12a969ebbfc? > > No can do. It is a public branch and there's no touching it. > Okay, got it. Thanks, Yazen
[PATCH] x86/mce/AMD: Give a name to MCA bank 3 to use with legacy MSRs
From: Yazen Ghannam MCA bank 3 is reserved on systems pre-Fam17h, so it didn't have a name. However, MCA bank 3 is defined on Fam17h systems and can be accessed using legacy MSRs. Without a name we get a stack trace on Fam17h systems when trying to register sysfs files for bank 3 on kernels that don't recognize Scalable MCA. Call MCA bank 3 "decode_unit" since this is what it represents on Fam17h. This will allow kernels without SMCA support to see this bank on Fam17h+ and prevent the stack trace. This will not affect older systems since this bank is reserved on them, i.e. it'll be ignored. Tested on AMD Fam15h and Fam17h systems. WARNING: CPU: 26 PID: 1 at lib/kobject.c:210 kobject_add_internal+0x23e/0x340() kobject: (88085bb256c0): attempted to be registered with empty name! ... Call Trace: [] dump_stack+0x63/0x90 [] warn_slowpath_common+0x86/0xc0 [] warn_slowpath_fmt+0x4c/0x50 [] ? kobject_add_internal+0x18e/0x340 [] ? kfree_const+0x22/0x30 [] kobject_add_internal+0x23e/0x340 [] ? kfree_const+0x22/0x30 [] kobject_add+0x68/0xb0 [] kobject_create_and_add+0x33/0x70 [] threshold_create_device+0x107/0x350 [] ? mcheck_vendor_init_severity+0x1a/0x1a [] threshold_init_device+0x35/0x4d [] do_one_initcall+0xb3/0x1d0 [] kernel_init_freeable+0x163/0x1f0 [] ? rest_init+0x80/0x80 [] kernel_init+0xe/0xe0 [] ret_from_fork+0x3f/0x70 [] ? rest_init+0x80/0x80 Signed-off-by: Yazen Ghannam Cc: # 3.10.. --- arch/x86/kernel/cpu/mcheck/mce_amd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c index d3e5be8..c82befc 100644 --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c @@ -62,7 +62,7 @@ static const char * const th_names[] = { "load_store", "insn_fetch", "combined_unit", - "", + "decode_unit", "northbridge", "execution_unit", }; -- 2.7.4
[PATCH 2/2] x86/mce/AMD: Carve out SMCA bank configuration
From: Yazen Ghannam Scalable MCA systems have a new MCA_CONFIG register that we use to configure each bank. We currently use this when we set up thresholding. However, this is logically separate. Move setup of MCA_CONFIG into a separate function. Signed-off-by: Yazen Ghannam --- arch/x86/kernel/cpu/mcheck/mce_amd.c | 48 1 file changed, 27 insertions(+), 21 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c index 4e459e0..95870b3 100644 --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c @@ -433,7 +433,7 @@ prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr, int offset, u32 misc_high) { unsigned int cpu = smp_processor_id(); - u32 smca_low, smca_high, smca_addr; + u32 smca_low, smca_high; struct threshold_block b; int new; @@ -457,7 +457,29 @@ prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr, goto set_offset; } - smca_addr = MSR_AMD64_SMCA_MCx_CONFIG(bank); + /* Gather LVT offset for thresholding: */ + if (rdmsr_safe(MSR_CU_DEF_ERR, &smca_low, &smca_high)) + goto out; + + new = (smca_low & SMCA_THR_LVT_OFF) >> 12; + +set_offset: + offset = setup_APIC_mce_threshold(offset, new); + + if ((offset == new) && (mce_threshold_vector != amd_threshold_interrupt)) + mce_threshold_vector = amd_threshold_interrupt; + +done: + mce_threshold_block_init(&b, offset); + +out: + return offset; +} + +static void set_smca_config(unsigned int bank) +{ + u32 smca_low, smca_high; + u32 smca_addr = MSR_AMD64_SMCA_MCx_CONFIG(bank); if (!rdmsr_safe(smca_addr, &smca_low, &smca_high)) { /* @@ -487,24 +509,6 @@ prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr, wrmsr(smca_addr, smca_low, smca_high); } - - /* Gather LVT offset for thresholding: */ - if (rdmsr_safe(MSR_CU_DEF_ERR, &smca_low, &smca_high)) - goto out; - - new = (smca_low & SMCA_THR_LVT_OFF) >> 12; - -set_offset: - offset = setup_APIC_mce_threshold(offset, new); - - if ((offset == new) && (mce_threshold_vector != amd_threshold_interrupt)) - mce_threshold_vector = amd_threshold_interrupt; - -done: - mce_threshold_block_init(&b, offset); - -out: - return offset; } /* cpu init entry point, called from mce.c with preempt off */ @@ -515,8 +519,10 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c) int offset = -1; for (bank = 0; bank < mca_cfg.banks; ++bank) { - if (mce_flags.smca) + if (mce_flags.smca) { get_smca_bank_info(bank); + set_smca_config(bank); + } for (block = 0; block < NR_BLOCKS; ++block) { address = get_block_address(cpu, address, low, high, bank, block); -- 2.7.4
[PATCH 1/2] x86/mce/AMD: Redo use of SMCA MCA_DE{STAT,ADDR} registers
From: Yazen Ghannam We have support for the new SMCA MCA_DE{STAT,ADDR} registers in Linux. So we've used these registers in place of MCA_{STATUS,ADDR} on SMCA systems. However, the guidance for current implementations of SMCA is to continue using MCA_{STATUS,ADDR} and to use MCA_DE{STAT,ADDR} only if a Deferred error was not found in the former registers. This also means we shouldn't clear MCA_CONFIG[LogDeferredInMcaStat]. Redo the AMD Deferred error interrupt handler to follow the guidance for current SMCA systems. Also, don't break after finding the first error. Don't clear MCA_CONFIG[LogDeferredInMcaStat] during AMD mcheck init. Signed-off-by: Yazen Ghannam --- arch/x86/kernel/cpu/mcheck/mce_amd.c | 47 1 file changed, 21 insertions(+), 26 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c index 524cc57..4e459e0 100644 --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c @@ -472,20 +472,6 @@ prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr, smca_high |= BIT(0); /* -* SMCA logs Deferred Error information in MCA_DE{STAT,ADDR} -* registers with the option of additionally logging to -* MCA_{STATUS,ADDR} if MCA_CONFIG[LogDeferredInMcaStat] is set. -* -* This bit is usually set by BIOS to retain the old behavior -* for OSes that don't use the new registers. Linux supports the -* new registers so let's disable that additional logging here. -* -* MCA_CONFIG[LogDeferredInMcaStat] is bit 34 (bit 2 in the high -* portion of the MSR). -*/ - smca_high &= ~BIT(2); - - /* * SMCA sets the Deferred Error Interrupt type per bank. * * MCA_CONFIG[DeferredIntTypeSupported] is bit 5, and tells us @@ -756,7 +742,8 @@ int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr) EXPORT_SYMBOL_GPL(umc_normaddr_to_sysaddr); static void -__log_error(unsigned int bank, bool deferred_err, bool threshold_err, u64 misc) +__log_error(unsigned int bank, bool deferred_err, bool use_smca_destat, + bool threshold_err, u64 misc) { u32 msr_status = msr_ops.status(bank); u32 msr_addr = msr_ops.addr(bank); @@ -765,7 +752,7 @@ __log_error(unsigned int bank, bool deferred_err, bool threshold_err, u64 misc) WARN_ON_ONCE(deferred_err && threshold_err); - if (deferred_err && mce_flags.smca) { + if (deferred_err && use_smca_destat) { msr_status = MSR_AMD64_SMCA_MCx_DESTAT(bank); msr_addr = MSR_AMD64_SMCA_MCx_DEADDR(bank); } @@ -807,6 +794,10 @@ __log_error(unsigned int bank, bool deferred_err, bool threshold_err, u64 misc) mce_log(&m); + /* We should still clear MCA_DESTAT even if we used MCA_STATUS. */ + if (mce_flags.smca && !use_smca_destat) + wrmsrl(MSR_AMD64_SMCA_MCx_DESTAT(bank), 0); + wrmsrl(msr_status, 0); } @@ -832,25 +823,29 @@ asmlinkage __visible void __irq_entry smp_trace_deferred_error_interrupt(void) exiting_ack_irq(); } +static inline bool check_deferred_status(u64 status) +{ + return ((status & MCI_STATUS_VAL) && (status & MCI_STATUS_DEFERRED)); +} + /* APIC interrupt handler for deferred errors */ static void amd_deferred_error_interrupt(void) { unsigned int bank; - u32 msr_status; u64 status; for (bank = 0; bank < mca_cfg.banks; ++bank) { - msr_status = (mce_flags.smca) ? MSR_AMD64_SMCA_MCx_DESTAT(bank) - : msr_ops.status(bank); + rdmsrl(msr_ops.status(bank), status); - rdmsrl(msr_status, status); + if (check_deferred_status(status)) { + __log_error(bank, true, false, false, 0); - if (!(status & MCI_STATUS_VAL) || - !(status & MCI_STATUS_DEFERRED)) - continue; + } else if (mce_flags.smca) { + rdmsrl(MSR_AMD64_SMCA_MCx_DESTAT(bank), status); - __log_error(bank, true, false, 0); - break; + if (check_deferred_status(status)) + __log_error(bank, true, true, false, 0); + } } } @@ -904,7 +899,7 @@ static void amd_threshold_interrupt(void) return; log: - __log_error(bank, false, true, ((u64)high << 32) | low); + __log_error(bank, false, false, true, ((u64)high << 32) | low); /* Reset threshold block after logging error. */ memset(&tr, 0, sizeof(tr)); -- 2.7.4
[PATCH v2 2/4] x86/mce/AMD; EDAC,amd64: Move find_umc_channel() to AMD mcheck
We need to find a UMC's channel number from mcheck code when translating UMC normalized addresses to system physical addresses. So move the function there from EDAC. Also, drop the struct pvt from the function parameters since we don't use it. And add a sanity check to make sure we're only looking at UMCs in case the UMC instance IDs ever match up with other bank types. Signed-off-by: Yazen Ghannam --- Link: http://lkml.kernel.org/r/1486760120-60944-2-git-send-email-yazen.ghan...@amd.com v1->v2: - Redo commit message based on comments. - Add UMC bank type sanity check. arch/x86/include/asm/mce.h | 2 ++ arch/x86/kernel/cpu/mcheck/mce_amd.c | 21 + drivers/edac/amd64_edac.c| 20 +--- 3 files changed, 24 insertions(+), 19 deletions(-) diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h index e638736..1ac261c 100644 --- a/arch/x86/include/asm/mce.h +++ b/arch/x86/include/asm/mce.h @@ -258,9 +258,11 @@ static inline void cmci_recheck(void) {} #ifdef CONFIG_X86_MCE_AMD void mce_amd_feature_init(struct cpuinfo_x86 *c); int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr); +int find_umc_channel(struct mce *m); #else static inline void mce_amd_feature_init(struct cpuinfo_x86 *c) { } static inline int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr) { return -EINVAL; }; +static inline int find_umc_channel(struct mce *m) { return -EINVAL; }; #endif int mce_available(struct cpuinfo_x86 *c); diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c index 524cc57..10fddcc 100644 --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c @@ -755,6 +755,27 @@ int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr) } EXPORT_SYMBOL_GPL(umc_normaddr_to_sysaddr); +/* + * To find the UMC channel represented by this bank we need to match on its + * instance_id. The instance_id of a bank is held in the lower 32 bits of its + * IPID. + */ +int find_umc_channel(struct mce *m) +{ + u32 umc_instance_id[] = {0x50f00, 0x150f00}; + u32 instance_id = m->ipid & GENMASK(31, 0); + int i, channel = -EINVAL; + + if (smca_banks[m->bank].hwid && + smca_banks[m->bank].hwid->bank_type == SMCA_UMC) + for (i = 0; i < ARRAY_SIZE(umc_instance_id); i++) + if (umc_instance_id[i] == instance_id) + channel = i; + + return channel; +} +EXPORT_SYMBOL_GPL(find_umc_channel); + static void __log_error(unsigned int bank, bool deferred_err, bool threshold_err, u64 misc) { diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index 82dab16..11f973d 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -2447,24 +2447,6 @@ static inline void decode_bus_error(int node_id, struct mce *m) __log_ecc_error(mci, &err, ecc_type); } -/* - * To find the UMC channel represented by this bank we need to match on its - * instance_id. The instance_id of a bank is held in the lower 32 bits of its - * IPID. - */ -static int find_umc_channel(struct amd64_pvt *pvt, struct mce *m) -{ - u32 umc_instance_id[] = {0x50f00, 0x150f00}; - u32 instance_id = m->ipid & GENMASK(31, 0); - int i, channel = -1; - - for (i = 0; i < ARRAY_SIZE(umc_instance_id); i++) - if (umc_instance_id[i] == instance_id) - channel = i; - - return channel; -} - static void decode_umc_error(int node_id, struct mce *m) { u8 ecc_type = (m->status >> 45) & 0x3; @@ -2484,7 +2466,7 @@ static void decode_umc_error(int node_id, struct mce *m) if (m->status & MCI_STATUS_DEFERRED) ecc_type = 3; - err.channel = find_umc_channel(pvt, m); + err.channel = find_umc_channel(m); if (err.channel < 0) { err.err_code = ERR_CHANNEL; goto log_error; -- 2.7.4
[PATCH v2 1/4] EDAC,mce_amd: Find node ID on SMCA systems using generic methods
From: Yazen Ghannam We should move away from using AMD-specific amd_get_nb_id() to find a node ID and move toward using generic Linux methods. We can use cpu_to_node() since NUMA should be working as expected on newly released Fam17h systems. Replace call to amd_get_nb_id() and related shifting by a call to cpu_to_node() which works as expected. Signed-off-by: Yazen Ghannam --- Link: http://lkml.kernel.org/r/1486760120-60944-1-git-send-email-yazen.ghan...@amd.com v1->v2: - Just use cpu_to_node() instead of trying to make amd_get_nb_id() work. drivers/edac/mce_amd.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c index ba35b7e..dbccf40 100644 --- a/drivers/edac/mce_amd.c +++ b/drivers/edac/mce_amd.c @@ -878,12 +878,8 @@ static void decode_smca_errors(struct mce *m) pr_cont("%s.\n", smca_mce_descs[bank_type].descs[xec]); } - /* -* amd_get_nb_id() returns the last level cache id. -* The last level cache on Fam17h is 1 level below the node. -*/ if (bank_type == SMCA_UMC && xec == 0 && decode_dram_ecc) - decode_dram_ecc(amd_get_nb_id(m->extcpu) >> 1, m); + decode_dram_ecc(cpu_to_node(m->extcpu), m); } static inline void amd_decode_err_code(u16 ec) -- 2.7.4
[PATCH v2 3/4] x86/mce/AMD: Mark Deferred errors as Action Optional on SMCA systems
From: Yazen Ghannam Give Deferred errors an Action Optional severity on SMCA systems so that the SRAO notifier block can potentially handle them. Signed-off-by: Yazen Ghannam --- Link: http://lkml.kernel.org/r/1486760120-60944-3-git-send-email-yazen.ghan...@amd.com v1->v2: - New in v2. Based on v1 patch 3. - Use mce_severity() in AMD interrupt handlers. - Set proper severity so we can use notifier chain instead of calling memory_failure() directly. arch/x86/kernel/cpu/mcheck/mce-severity.c | 6 +- arch/x86/kernel/cpu/mcheck/mce_amd.c | 4 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c index 87cc9ab..2773c85 100644 --- a/arch/x86/kernel/cpu/mcheck/mce-severity.c +++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c @@ -278,8 +278,12 @@ static int mce_severity_amd(struct mce *m, int tolerant, char **msg, bool is_exc * deferred error: poll handler catches these and adds to mce_ring so * memory-failure can take recovery actions. */ - if (m->status & MCI_STATUS_DEFERRED) + if (m->status & MCI_STATUS_DEFERRED) { + if (mce_flags.smca) + return MCE_AO_SEVERITY; + return MCE_DEFERRED_SEVERITY; + } /* * corrected error: poll handler catches these and passes responsibility diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c index 10fddcc..743ae31 100644 --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c @@ -28,6 +28,8 @@ #include #include +#include "mce-internal.h" + #define NR_BLOCKS 5 #define THRESHOLD_MAX 0xFFF #define INT_TYPE_APIC 0x0002 @@ -802,6 +804,8 @@ __log_error(unsigned int bank, bool deferred_err, bool threshold_err, u64 misc) m.bank = bank; m.tsc= rdtsc(); + m.severity = mce_severity(&m, mca_cfg.tolerant, NULL, false); + if (threshold_err) m.misc = misc; -- 2.7.4
[PATCH v2 0/4] Call memory_failure() on Deferred errors
From: Yazen Ghannam This set is based on an earlier 3 patch set. Patch 1: - Address comments by using cpu_to_node() when finding a node ID rather than amd_get_nb_id(). Link: http://lkml.kernel.org/r/1486760120-60944-1-git-send-email-yazen.ghan...@amd.com Patch 2: - Fix up commit message. Link: http://lkml.kernel.org/r/1486760120-60944-2-git-send-email-yazen.ghan...@amd.com Patches 3 and 4: - Redo Patch 3 from old set. - After looking at the CEC patch I got the idea to use the SRAO notifier block instead of calling memory_failure() from the AMD Deferred error interrupt handler. Link: http://lkml.kernel.org/r/1486760120-60944-3-git-send-email-yazen.ghan...@amd.com Yazen Ghannam (4): EDAC,mce_amd: Find node ID on SMCA systems using generic methods x86/mce/AMD; EDAC,amd64: Move find_umc_channel() to AMD mcheck x86/mce/AMD: Mark Deferred errors as Action Optional on SMCA systems x86/mce: Add AMD SMCA support to SRAO notifier arch/x86/include/asm/mce.h| 2 ++ arch/x86/kernel/cpu/mcheck/mce-severity.c | 6 +++- arch/x86/kernel/cpu/mcheck/mce.c | 52 --- arch/x86/kernel/cpu/mcheck/mce_amd.c | 25 +++ drivers/edac/amd64_edac.c | 20 +--- drivers/edac/mce_amd.c| 2 +- 6 files changed, 74 insertions(+), 33 deletions(-) -- 2.7.4
[PATCH v2 4/4] x86/mce: Add AMD SMCA support to SRAO notifier
From: Yazen Ghannam Deferred errors on AMD systems may get an Action Optional severity with the goal of being handled by the SRAO notifier block. However, the process of determining if an address is usable is different between Intel and AMD. So define vendor-specific functions for this. Also, check for the AO severity before determining if an address is usable to possibly save some cycles. Signed-off-by: Yazen Ghannam --- Link: http://lkml.kernel.org/r/1486760120-60944-3-git-send-email-yazen.ghan...@amd.com v1->v2: - New in v2. Based on v1 patch 3. - Update SRAO notifier block to handle errors from SMCA systems. arch/x86/kernel/cpu/mcheck/mce.c | 52 ++-- 1 file changed, 40 insertions(+), 12 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index 5e365a2..1a2669d 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -547,19 +547,49 @@ static void mce_report_event(struct pt_regs *regs) * be somewhat complicated (e.g. segment offset would require an instruction * parser). So only support physical addresses up to page granuality for now. */ -static int mce_usable_address(struct mce *m) +static int mce_usable_address_intel(struct mce *m, unsigned long *pfn) { - if (!(m->status & MCI_STATUS_MISCV) || !(m->status & MCI_STATUS_ADDRV)) + if (!(m->status & MCI_STATUS_MISCV)) return 0; - - /* Checks after this one are Intel-specific: */ - if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) - return 1; - if (MCI_MISC_ADDR_LSB(m->misc) > PAGE_SHIFT) return 0; if (MCI_MISC_ADDR_MODE(m->misc) != MCI_MISC_ADDR_PHYS) return 0; + + *pfn = m->addr >> PAGE_SHIFT; + return 1; +} + +/* Only support this on SMCA systems and errors logged from a UMC. */ +static int mce_usable_address_amd(struct mce *m, unsigned long *pfn) +{ + u8 umc; + u16 nid = cpu_to_node(m->extcpu); + u64 addr; + + if (!mce_flags.smca) + return 0; + + umc = find_umc_channel(m); + + if (umc < 0 || umc_normaddr_to_sysaddr(m->addr, nid, umc, &addr)) + return 0; + + *pfn = addr >> PAGE_SHIFT; + return 1; +} + +static int mce_usable_address(struct mce *m, unsigned long *pfn) +{ + if (!(m->status & MCI_STATUS_ADDRV)) + return 0; + + if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) + return mce_usable_address_intel(m, pfn); + + if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) + return mce_usable_address_amd(m, pfn); + return 1; } @@ -567,15 +597,13 @@ static int srao_decode_notifier(struct notifier_block *nb, unsigned long val, void *data) { struct mce *mce = (struct mce *)data; - unsigned long pfn; + unsigned long pfn = 0; if (!mce) return NOTIFY_DONE; - if (mce_usable_address(mce) && (mce->severity == MCE_AO_SEVERITY)) { - pfn = mce->addr >> PAGE_SHIFT; + if ((mce->severity == MCE_AO_SEVERITY) && mce_usable_address(mce, &pfn)) memory_failure(pfn, MCE_VECTOR, 0); - } return NOTIFY_OK; } @@ -750,7 +778,7 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b) */ if (!(flags & MCP_DONTLOG) && !mca_cfg.dont_log_ce) mce_log(&m); - else if (mce_usable_address(&m)) { + else if (mce_usable_address(&m, NULL)) { /* * Although we skipped logging this, we still want * to take action. Add to the pool so the registered -- 2.7.4
[PATCH] x86/mce: Do feature check earlier
We may miss some information when errors are logged during boot before the feature flags are set. For example, on SMCA systems we will not log the MCA_IPID and MCA_SYND registers and we won't mask MCA_ADDR appropriately. Move the feature checks before generic init. The rest of the vendor feature initialization will still happen after generic init. Signed-off-by: Yazen Ghannam --- arch/x86/kernel/cpu/mcheck/mce.c | 27 --- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index 177472a..d1f675b 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -1702,14 +1702,9 @@ static int __mcheck_cpu_ancient_init(struct cpuinfo_x86 *c) return 0; } -static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c) +static void __mcheck_cpu_init_feature_early(struct cpuinfo_x86 *c) { switch (c->x86_vendor) { - case X86_VENDOR_INTEL: - mce_intel_feature_init(c); - mce_adjust_timer = cmci_intel_adjust_timer; - break; - case X86_VENDOR_AMD: { mce_flags.overflow_recov = !!cpu_has(c, X86_FEATURE_OVERFLOW_RECOV); mce_flags.succor = !!cpu_has(c, X86_FEATURE_SUCCOR); @@ -1724,7 +1719,6 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c) msr_ops.addr= smca_addr_reg; msr_ops.misc= smca_misc_reg; } - mce_amd_feature_init(c); break; } @@ -1734,6 +1728,24 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c) } } +static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c) +{ + switch (c->x86_vendor) { + case X86_VENDOR_INTEL: + mce_intel_feature_init(c); + mce_adjust_timer = cmci_intel_adjust_timer; + break; + + case X86_VENDOR_AMD: { + mce_amd_feature_init(c); + break; + } + + default: + break; + } +} + static void __mcheck_cpu_clear_vendor(struct cpuinfo_x86 *c) { switch (c->x86_vendor) { @@ -1812,6 +1824,7 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c) machine_check_vector = do_machine_check; + __mcheck_cpu_init_feature_early(c); __mcheck_cpu_init_generic(); __mcheck_cpu_init_vendor(c); __mcheck_cpu_init_clear_banks(); -- 2.7.4
Re: [PATCH 1/3] x86/RAS: Simplify SMCA bank descriptor struct
> > Call the struct simply smca_bank, it's instance ID can be simply ->id. > Makes the code much more readable. > > Signed-off-by: Borislav Petkov Looks good to me. Please add: Tested-by: Yazen Ghannam Ditto for the others. Thanks, Yazen
Re: [PATCH] x86/mce: fix a wrong assignment of i_mce.status
On Thu, Jun 11, 2020 at 12:55:00PM -0400, Luck, Tony wrote: > +Yazen > > On Thu, Jun 11, 2020 at 10:32:38AM +0800, Zhenzhong Duan wrote: > > The original code is a nop as i_mce.status is or'ed with part of itself, > > fix it. > > > > Signed-off-by: Zhenzhong Duan > > --- > > arch/x86/kernel/cpu/mce/inject.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/arch/x86/kernel/cpu/mce/inject.c > > b/arch/x86/kernel/cpu/mce/inject.c > > index 3413b41..dc28a61 100644 > > --- a/arch/x86/kernel/cpu/mce/inject.c > > +++ b/arch/x86/kernel/cpu/mce/inject.c > > @@ -511,7 +511,7 @@ static void do_inject(void) > > */ > > if (inj_type == DFR_INT_INJ) { > > i_mce.status |= MCI_STATUS_DEFERRED; > > - i_mce.status |= (i_mce.status & ~MCI_STATUS_UC); > > + i_mce.status &= ~MCI_STATUS_UC; > > Boris: "git blame" says you wrote this code. Patch looks right (in > that it makes the code do what the comment just above says it is trying > to do): > > * - MCx_STATUS[UC] cleared: deferred errors are _not_ UC > > But this is AMD specific, so I'll defer judgement > Acked-by: Yazen Ghannam Thanks, Yazen
[PATCH] EDAC/mce_amd: Add new error descriptions for existing types
From: Yazen Ghannam A few existing MCA bank types will have new error types in future SMCA systems. Add the descriptions for the new error types. Signed-off-by: Yazen Ghannam --- drivers/edac/mce_amd.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c index 325aedf46ff2..4fd06a3dc6fe 100644 --- a/drivers/edac/mce_amd.c +++ b/drivers/edac/mce_amd.c @@ -210,6 +210,11 @@ static const char * const smca_if_mce_desc[] = { "L2 BTB Multi-Match Error", "L2 Cache Response Poison Error", "System Read Data Error", + "Hardware Assertion Error", + "L1-TLB Multi-Hit", + "L2-TLB Multi-Hit", + "BSR Parity Error", + "CT MCE", }; static const char * const smca_l2_mce_desc[] = { @@ -228,7 +233,8 @@ static const char * const smca_de_mce_desc[] = { "Fetch address FIFO parity error", "Patch RAM data parity error", "Patch RAM sequencer parity error", - "Micro-op buffer parity error" + "Micro-op buffer parity error", + "Hardware Assertion MCA Error", }; static const char * const smca_ex_mce_desc[] = { @@ -244,6 +250,8 @@ static const char * const smca_ex_mce_desc[] = { "Scheduling queue parity error", "Branch buffer queue parity error", "Hardware Assertion error", + "Spec Map parity error", + "Retire Map parity error", }; static const char * const smca_fp_mce_desc[] = { @@ -360,6 +368,7 @@ static const char * const smca_smu2_mce_desc[] = { "Instruction Tag Cache Bank A ECC or parity error", "Instruction Tag Cache Bank B ECC or parity error", "System Hub Read Buffer ECC or parity error", + "PHY RAM ECC error", }; static const char * const smca_mp5_mce_desc[] = { -- 2.25.1
Re: [PATCH v2 8/8] x86/MCE/AMD Support new memory interleaving modes during address translation
On Mon, Sep 28, 2020 at 08:14:07PM +0200, Borislav Petkov wrote: > On Mon, Sep 28, 2020 at 10:53:50AM -0500, Yazen Ghannam wrote: > > > I agree that the translation code is implementation-specific and applies > > only to DRAM ECC errors, so it make sense to have it in amd64_edac. The > > only issue is getting the address translation to earlier notifiers. I > > think we can add a new one in amd64_edac to run before others. Maybe this > > can be a new priority class like MCE_PRIO_PREPROCESS, or something like > > that for notifiers that fixup the MCE data. > > Well, I'm not sure you need notifiers here - you wanna call > mce_usable_address() and in it, it should do the address conversion > calculation to give you a physical address which you can feed to > memory_failure etc. > > Now, mce_usable_address() is core code and we can make core code call > into a module but that is yucky. So *that* is your reason for keeping it > where it is. > Okay, we'll keep the code where it is. I'll work on another set to call the address translation with mce_usable_address(). > Looking at its size: > > $ readelf -s vmlinux | grep umc_normaddr_to > 2864: 817d8ae5 168 FUNCLOCAL DEFAULT1 > umc_normaddr_to_[...] > 91866: 81030e00 1127 FUNCGLOBAL DEFAULT1 > umc_normaddr_to_[...] > > that's something like ~1.3K and if you split it and do some > experimenting, you might get it even slimmer. Not that ~1.3K is that > huge for current standards but we should always aim at not bloating the > fat guy our kernel already is. > Okay, I'll keep an eye on this and try to slim it down. Thanks, Yazen
[PATCH] EDAC/amd64: Set proper family type for Family 19h Models 20h-2Fh
From: Yazen Ghannam AMD Family 19h Models 20h-2Fh use the same PCI IDs as Family 17h Models 70h-7Fh. The same family ops and number of channels also apply. Use the Family17h Model 70h family_type and ops for Family 19h Models 20h-2Fh. Update the controller name to match the system. Signed-off-by: Yazen Ghannam --- drivers/edac/amd64_edac.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index fcc08bbf6945..1362274d840b 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -3385,6 +3385,12 @@ static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt) break; case 0x19: + if (pvt->model >= 0x20 && pvt->model <= 0x2f) { + fam_type = &family_types[F17_M70H_CPUS]; + pvt->ops = &family_types[F17_M70H_CPUS].ops; + fam_type->ctl_name = "F19h_M20h"; + break; + } fam_type= &family_types[F19_CPUS]; pvt->ops= &family_types[F19_CPUS].ops; family_types[F19_CPUS].ctl_name = "F19h"; -- 2.25.1
Re: [PATCH v4] cper, apei, mce: Pass x86 CPER through the MCA handling chain
On Fri, Sep 25, 2020 at 09:54:06AM +0900, Punit Agrawal wrote: > Borislav Petkov writes: > > > On Thu, Sep 24, 2020 at 12:23:27PM -0500, Smita Koralahalli Channabasappa > > wrote: > >> > Even though it's not defined in the UEFI spec, it doesn't mean a > >> > structure definition cannot be created. > > > > Created for what? That structure better have a big fat comment above it, > > what > > firmware generates its layout. > > Maybe I could've used a better choice of words - I meant to define a > structure with meaningful member names to replace the *(ptr + i) > accesses in the patch. > > The requirement for documenting the record layout doesn't change - > whether using raw pointer arithmetic vs a structure definition. > > >> > After all, the patch is relying on some guarantee of the meaning of > >> > the values and their ordering. > > > > AFAICT, this looks like an ad-hoc definition and the moment they change > > it in some future revision, that struct of yours becomes invalid so we'd > > need to add another one. > > If there's no spec backing the current layout, then it'll indeed be an > ad-hoc definition of a structure in the kernel. But considering that > it's part of firmware / OS interface for an important part of the RAS > story I would hope that the code is based on a spec - having that > reference included would help maintainability. > > Incompatible changes will indeed break the assumptions in the kernel and > code will need to be updated - regardless of the choice of kernel > implementation; pointer arithmetic, structure definition - ad-hoc or > spec provided. > > Having versioning will allow running older kernels on newer hardware and > vice versa - but I don't see why that is important only when using a > structure based access. > There is no versioning option for the x86 context info structure in the UEFI spec, so I don't think there'd be a clean way to include version information. The format of the data in the context info is not totally ad-hoc, and it does follow the UEFI spec. The "Register Array" field is raw data. This may follow one of the predefined formats in the UEFI spec like the "X64 Register State", etc. Or, in the case of MSR and Memory Mapped Registers, this is a raw dump of the registers starting from the address shown in the structure. The two values that can be changed are the starting address and the array size. These two together provide a window to the registers. The registers are fixed, so a single context info struture should include a single contiguous range of registers. Multiple context info structures can be provided to include registers from different, non-contiguous ranges. This patch is checking if an MSR context info structure lines up with the MCAX register space used on Scalable MCA systems. This register space is defined in the AMD Processor Programming Reference for various products. This is considered a hardware feature extension, so the existing register layout won't change though new registers may be added. A layout change would require moving to another register space which is what happened going from legacy MCA (starting at address 0x400) to MCAX (starting at address 0xC0002000) registers. The only two things firmware can change are from what address does the info start and where does the info end. So the implementation-specific details here are that currently the starting address is MCA_STATUS (in MCAX space) for a bank and the remaining info includes the other MCA registers for this bank. So I think the kernel can be strict with this format, i.e. the two variables match what we're looking for. This patch already has a check on the starting address. It should also include a check that "Register Array Size" is large enough to include all the registers we want to extract. If the format doesn't match, then we fall back to a raw dump of the data like we have today. Or the kernel can be more flexible and try to find the window of registers based on the starting address. I think this is really open-ended though. Does this sound reasonable? Thanks, Yazen
Re: [PATCH v2 8/8] x86/MCE/AMD Support new memory interleaving modes during address translation
On Fri, Sep 25, 2020 at 09:22:31AM +0200, Borislav Petkov wrote: > On Wed, Sep 23, 2020 at 11:25:10AM -0500, Yazen Ghannam wrote: > > I don't remember the original reason, and I was recently asked about > > this code living in a module. I did some looking after this ask, and I > > found that we should be using this translation to get a proper value for > > the memory error notifiers to use. So I think we still need to use this > > function some way with the core code even if the EDAC interface isn't > > used. > > You'd need to be more specific here, you want to bypass amd64_edac to > decode errors? Judging by the current RAS activity coming from you guys, > I'm thinking firmware. But then wouldn't the firmware do the decoding > for us and then this function is not even needed? > The UC, NFIT, and CEC notifiers all operate on system physical addresses. The address in the MCE record is checked by mce_usable_address() to see if it can be used by the kernel, i.e. the address is a system physical address. Right now, this check passes on AMD systems if MCA_STATUS[AddrV] is set. This works for memory errors on legacy AMD systems, since the NB MCA bank logs a physical address for DRAM ECC errors. But this won't work on newer systems, because the UMC MCA bank does not log a system physical address for DRAM ECC errors. So the address provided by the hardware will need to be translated to a physical address before the notifiers in the MCE chain can use it. We can add support to get the physical address from firmware in some cases. But it looks to me that we'll still need to keep updating the translation code in the kernel to cover some platform/user configurations. So it makes sense to me to move the functionality into a module to make it easier to update. The address translation needs to be done before the notfiers that need it, and EDAC comes after all of them. There's also the case where the EDAC interface isn't wanted, so amd64_edac will be unloaded. But the functionality in the other notifiers are still expected to be available. So it's more than just decoding the error like we do now with amd64_edac. That's why I think the translation code can be in a separate module with a notfier that runs before the others. This can do the translation once then pass the result down to the CEC, UC, NFIT, and EDAC notifiers to use as needed. Thanks, Yazen
Re: [PATCH v2 8/8] x86/MCE/AMD Support new memory interleaving modes during address translation
On Mon, Sep 28, 2020 at 11:47:59AM +0200, Borislav Petkov wrote: > On Fri, Sep 25, 2020 at 02:51:27PM -0500, Yazen Ghannam wrote: > > > The address translation needs to be done before the notfiers that need > > it, and EDAC comes after all of them. There's also the case where the > > EDAC interface isn't wanted, so amd64_edac will be unloaded. > > I'd be interested as to why. Because decoding addresses is amd64_edac > *core* functionality. We can stick it in drivers/edac/mce_amd.c but I'd > like to hear what those valid reasons are, not to use the driver which > is supposed to do that anyway. > I don't have any clear reasons. I just get vague use cases sometimes about not using EDAC and relying on other things. But it shouldn't hurt to have the module load anyway. The EDAC messages can be suppressed, and the sysfs interface can be ignored. So, after a bit more thought, this doesn't seem like a good reason. I agree that the translation code is implementation-specific and applies only to DRAM ECC errors, so it make sense to have it in amd64_edac. The only issue is getting the address translation to earlier notifiers. I think we can add a new one in amd64_edac to run before others. Maybe this can be a new priority class like MCE_PRIO_PREPROCESS, or something like that for notifiers that fixup the MCE data. I can start by moving the address translation to amd64_edac and doing the code cleanup. Thanks, Yazen
Re: 5.6.12 MCE on AMD EPYC 7502
On Fri, May 29, 2020 at 07:57:20AM -0400, Borislav Petkov wrote: > On Fri, May 29, 2020 at 01:55:29PM +0300, Dmitry Antipov wrote: > > Hello, > > > > I'm facing the following kernel messages running Debian 9 with > > custom 5.6.12 kernel running on AMD EPYC 7502 - based hardware: > > > > [138537.806814] mce: [Hardware Error]: Machine check events logged > > [138537.806818] [Hardware Error]: Corrected error, no action required. > > [138537.808456] [Hardware Error]: CPU:0 (17:31:0) > > MC27_STATUS[Over|CE|MiscV|-|-|-|SyndV|-|-|-]: 0xd822080b > > [138537.810080] [Hardware Error]: IPID: 0x0001002e1e01, Syndrome: > > 0x5a05 > > [138537.811694] [Hardware Error]: Power, Interrupts, etc. Ext. Error Code: > > 2, Link Error. > > [138537.813281] [Hardware Error]: cache level: L3/GEN, mem/io: IO, mem-tx: > > GEN, part-proc: SRC (no timeout) > > > > Is it related to some (not so) known CPU errata? > > Who knows. > There aren't any reported errata related to this that I could find. > > Should I try to update microcode, motherboard firmware, kernel, or whatever > > else? > > Yeah, BIOS update might be a good idea, if there's a newer version for > your board. > I agree. The link settings are generally tuned for the platform. So the platform vendor may have a fix. Thanks, Yazen
Re: [PATCH 1/3] x86/amd_nb: add AMD family 17h model 60h PCI IDs
On Sun, May 10, 2020 at 04:48:40PM -0400, Alexander Monakov wrote: > Add PCI IDs for AMD Renoir (4000-series Ryzen CPUs). This is necessary > to enable support for temperature sensors via the k10temp module. > > Signed-off-by: Alexander Monakov > Cc: Thomas Gleixner > Cc: Borislav Petkov > Cc: x...@kernel.org > Cc: Yazen Ghannam > Cc: Brian Woods > Cc: Clemens Ladisch > Cc: Jean Delvare > Cc: Guenter Roeck > Cc: linux-hw...@vger.kernel.org > Cc: linux-e...@vger.kernel.org Acked-by: Yazen Ghannam Thanks, Yazen
Re: [PATCH 3/3] EDAC/amd64: Add AMD family 17h model 60h PCI IDs
On Sun, May 10, 2020 at 04:48:42PM -0400, Alexander Monakov wrote: > Add support for AMD Renoir (4000-series Ryzen CPUs). > > Signed-off-by: Alexander Monakov > Cc: Thomas Gleixner > Cc: Borislav Petkov > Cc: x...@kernel.org > Cc: Yazen Ghannam > Cc: Brian Woods > Cc: Clemens Ladisch > Cc: Jean Delvare > Cc: Guenter Roeck > Cc: linux-hw...@vger.kernel.org > Cc: linux-e...@vger.kernel.org Acked-by: Yazen Ghannam Thanks, Yazen
[PATCH v2] x86/mce: Increase maximum number of banks to 64
From: Akshay Gupta ...because future AMD systems will support up to 64 MCA banks per CPU. MAX_NR_BANKS is used to allocate a number of data structures, and it is used as a ceiling for values read from MCG_CAP[Count]. Therefore, this change will have no functional effect on existing systems with 32 or fewer MCA banks per CPU. However, this will increase the size of the following structures. Global bitmaps: - core.c / mce_banks_ce_disabled - core.c / all_banks - core.c / valid_banks - core.c / toclear - Total: 32 new bits * 4 bitmaps = 16 new bytes Per-CPU bitmaps: - core.c / mce_poll_banks - intel.c / mce_banks_owned - Total: 32 new bits * 2 bitmaps = 8 new bytes The bitmaps are arrays of longs. So this change will only affect 32-bit execution, since there will be one additional long used. There will be no additional memory use on 64-bit execution, because the size of long is 64 bits. Global structs: - amd.c / struct smca_bank smca_banks[]: 16 bytes per bank - core.c / struct mce_bank_dev mce_bank_devs[]: 56 bytes per bank - Total: 32 new banks * (16 + 56) bytes = 2304 new bytes Per-CPU structs: - core.c / struct mce_bank mce_banks_array[]: 16 bytes per bank - Total: 32 new banks * 16 bytes = 512 new bytes 32-bit Total global size increase: 2320 bytes Total per-CPU size increase: 520 bytes 64-bit Total global size increase: 2304 bytes Total per-CPU size increase: 512 bytes This additional memory should still fit within the existing .data section of the kernel binary. However, in the case where it doesn't fit, an additional page (4kB) of memory will be added to the binary to accommodate the extra data. Signed-off-by: Akshay Gupta [ Adjust commit message and code comment. ] Signed-off-by: Yazen Ghannam --- Link: https://lkml.kernel.org/r/20200820170624.1855825-1-yazen.ghan...@amd.com v1->v2: * Update commit message with discussion details from review. arch/x86/include/asm/mce.h | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h index 6adced6e7dd3..109af5c7f515 100644 --- a/arch/x86/include/asm/mce.h +++ b/arch/x86/include/asm/mce.h @@ -200,12 +200,8 @@ void mce_setup(struct mce *m); void mce_log(struct mce *m); DECLARE_PER_CPU(struct device *, mce_device); -/* - * Maximum banks number. - * This is the limit of the current register layout on - * Intel CPUs. - */ -#define MAX_NR_BANKS 32 +/* Maximum number of MCA banks per CPU. */ +#define MAX_NR_BANKS 64 #ifdef CONFIG_X86_MCE_INTEL void mce_intel_feature_init(struct cpuinfo_x86 *c); -- 2.25.1
[PATCH v2 3/8] EDAC/mce_amd: Use struct cpuinfo_x86.node_id for NodeId
From: Yazen Ghannam The edac_mce_amd module calls decode_dram_ecc() on AMD Family17h and later systems. This function is used in amd64_edac_mod to do system-specific decoding for DRAM ECC errors. The function takes a "NodeId" as a parameter. In AMD documentation, NodeId is used to identify a physical die in a system. This can be used to identify a node in the AMD_NB code and also it is used with umc_normaddr_to_sysaddr(). However, the input used for decode_dram_ecc() is currently the NUMA node of a logical CPU. In the default configuration, the NUMA node and physical die will be equivalent, so this doesn't have an impact. But the NUMA node configuration can be adjusted with optional memory interleaving modes. This will cause the NUMA node enumeration to not match the physical die enumeration. The mismatch will cause the address translation function to fail or report incorrect results. Use struct cpuinfo_x86.node_id for the node_id parameter to ensure the physical ID is used. Fixes: fbe63acf62f5 ("EDAC, mce_amd: Use cpu_to_node() to find the node ID") Signed-off-by: Yazen Ghannam --- Link: https://lkml.kernel.org/r/20200814191449.183998-2-yazen.ghan...@amd.com v1 -> v2: * Redo based on change in Patch 1. drivers/edac/mce_amd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c index ac9bd74c92cd..91b5e3e0744e 100644 --- a/drivers/edac/mce_amd.c +++ b/drivers/edac/mce_amd.c @@ -1003,7 +1003,7 @@ static void decode_smca_error(struct mce *m) pr_cont(", %s.\n", smca_mce_descs[bank_type].descs[xec]); if (bank_type == SMCA_UMC && xec == 0 && decode_dram_ecc) - decode_dram_ecc(cpu_to_node(m->extcpu), m); + decode_dram_ecc(cpu_data(m->extcpu).node_id, m); } static inline void amd_decode_err_code(u16 ec) -- 2.25.1
[PATCH v2 4/8] x86/MCE/AMD: Use defines for register addresses in translation code
From: Yazen Ghannam Replace raw register offset values in the AMD address translation code with named definitions. Also, drop comments that only note the register names. Signed-off-by: Yazen Ghannam --- Link: https://lkml.kernel.org/r/20200814191449.183998-3-yazen.ghan...@amd.com v1 -> v2: * New patch based on comments for v1 Patch 2. arch/x86/kernel/cpu/mce/amd.c | 26 +++--- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c index be96f77004ad..1e0510fd5afc 100644 --- a/arch/x86/kernel/cpu/mce/amd.c +++ b/arch/x86/kernel/cpu/mce/amd.c @@ -675,6 +675,14 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c) deferred_error_interrupt_enable(c); } +#define DF_F0_FABRICINSTINFO3 0x50 +#define DF_F0_MMIOHOLE 0x104 +#define DF_F0_DRAMBASEADDR 0x110 +#define DF_F0_DRAMLIMITADDR0x114 +#define DF_F0_DRAMOFFSET 0x1B4 + +#define DF_F1_SYSFABRICID 0x208 + int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr) { u64 dram_base_addr, dram_limit_addr, dram_hole_base; @@ -691,22 +699,21 @@ int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr) u8 cs_mask, cs_id = 0; bool hash_enabled = false; - /* Read D18F0x1B4 (DramOffset), check if base 1 is used. */ - if (amd_df_indirect_read(nid, 0, 0x1B4, umc, &tmp)) + if (amd_df_indirect_read(nid, 0, DF_F0_DRAMOFFSET, umc, &tmp)) goto out_err; /* Remove HiAddrOffset from normalized address, if enabled: */ if (tmp & BIT(0)) { u64 hi_addr_offset = (tmp & GENMASK_ULL(31, 20)) << 8; + /* Check if base 1 is used. */ if (norm_addr >= hi_addr_offset) { ret_addr -= hi_addr_offset; base = 1; } } - /* Read D18F0x110 (DramBaseAddress). */ - if (amd_df_indirect_read(nid, 0, 0x110 + (8 * base), umc, &tmp)) + if (amd_df_indirect_read(nid, 0, DF_F0_DRAMBASEADDR + (8 * base), umc, &tmp)) goto out_err; /* Check if address range is valid. */ @@ -728,8 +735,7 @@ int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr) goto out_err; } - /* Read D18F0x114 (DramLimitAddress). */ - if (amd_df_indirect_read(nid, 0, 0x114 + (8 * base), umc, &tmp)) + if (amd_df_indirect_read(nid, 0, DF_F0_DRAMLIMITADDR + (8 * base), umc, &tmp)) goto out_err; intlv_num_sockets = (tmp >> 8) & 0x1; @@ -780,12 +786,11 @@ int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr) u8 die_id_bit, sock_id_bit, cs_fabric_id; /* -* Read FabricBlockInstanceInformation3_CS[BlockFabricID]. * This is the fabric id for this coherent slave. Use * umc/channel# as instance id of the coherent slave * for FICAA. */ - if (amd_df_indirect_read(nid, 0, 0x50, umc, &tmp)) + if (amd_df_indirect_read(nid, 0, DF_F0_FABRICINSTINFO3, umc, &tmp)) goto out_err; cs_fabric_id = (tmp >> 8) & 0xFF; @@ -800,9 +805,8 @@ int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr) sock_id_bit = die_id_bit; - /* Read D18F1x208 (SystemFabricIdMask). */ if (intlv_num_dies || intlv_num_sockets) - if (amd_df_indirect_read(nid, 1, 0x208, umc, &tmp)) + if (amd_df_indirect_read(nid, 1, DF_F1_SYSFABRICID, umc, &tmp)) goto out_err; /* If interleaved over more than 1 die. */ @@ -841,7 +845,7 @@ int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr) /* If legacy MMIO hole enabled */ if (lgcy_mmio_hole_en) { - if (amd_df_indirect_read(nid, 0, 0x104, umc, &tmp)) + if (amd_df_indirect_read(nid, 0, DF_F0_MMIOHOLE, umc, &tmp)) goto out_err; dram_hole_base = tmp & GENMASK(31, 24); -- 2.25.1
[PATCH v2 0/8] AMD MCA Address Translation Updates
From: Yazen Ghannam This patchset includes updates for the MCA Address Translation process on recent AMD systems. Patches 1 & 3: Fixes an input to the address translation function. The translation requires a physical Die ID (NodeId in AMD documentation) rather than a logicial NUMA node ID. This is because the physical and logical nodes may not always match. Patch 2: Removes a function that is no longer needed with Patch 1. Patches 4-7: Code cleanup in preparation for Patch 8. Patch 8: Add translation support for new memory interleaving options available in Rome systems. The patch is based on the latest AMD reference code for the address translation. Patches 6-8 have checkpatch warnings about long lines, but I kept the long lines for readability. Thanks, Yazen Link: https://lkml.kernel.org/r/20200814191449.183998-1-yazen.ghan...@amd.com v1 -> v2: * Save the AMD NodeId value in struct cpuinfo_x86 rather than use a local value in MCA code. * Include code cleanup for AMD MCA Address Translation function before adding new functionality. Muralidhara M K (1): x86/MCE/AMD Support new memory interleaving modes during address translation Yazen Ghannam (7): x86/CPU/AMD: Save NodeId on AMD-based systems x86/CPU/AMD: Remove amd_get_nb_id() EDAC/mce_amd: Use struct cpuinfo_x86.node_id for NodeId x86/MCE/AMD: Use defines for register addresses in translation code x86/MCE/AMD: Use macros to get bitfields in translation code x86/MCE/AMD: Drop tmp variable in translation code x86/MCE/AMD: Group register reads in translation code arch/x86/events/amd/core.c | 2 +- arch/x86/include/asm/cacheinfo.h | 4 +- arch/x86/include/asm/processor.h | 3 +- arch/x86/kernel/amd_nb.c | 4 +- arch/x86/kernel/cpu/amd.c| 17 +- arch/x86/kernel/cpu/cacheinfo.c | 8 +- arch/x86/kernel/cpu/hygon.c | 11 +- arch/x86/kernel/cpu/mce/amd.c| 284 ++- arch/x86/kernel/cpu/mce/inject.c | 4 +- drivers/edac/amd64_edac.c| 4 +- drivers/edac/mce_amd.c | 4 +- 11 files changed, 233 insertions(+), 112 deletions(-) -- 2.25.1
[PATCH v2 5/8] x86/MCE/AMD: Use macros to get bitfields in translation code
From: Yazen Ghannam Define macros to get individual bits and bitfields. Use these to make the code more readable. Signed-off-by: Yazen Ghannam --- Link: https://lkml.kernel.org/r/20200814191449.183998-3-yazen.ghan...@amd.com v1 -> v2: * New patch based on comments for v1 Patch 2. arch/x86/kernel/cpu/mce/amd.c | 46 +-- 1 file changed, 23 insertions(+), 23 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c index 1e0510fd5afc..90c3ad61ae19 100644 --- a/arch/x86/kernel/cpu/mce/amd.c +++ b/arch/x86/kernel/cpu/mce/amd.c @@ -675,6 +675,9 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c) deferred_error_interrupt_enable(c); } +#define get_bits(x, msb, lsb) ((x & GENMASK_ULL(msb, lsb)) >> lsb) +#define get_bit(x, bit)((x >> bit) & BIT(0)) + #define DF_F0_FABRICINSTINFO3 0x50 #define DF_F0_MMIOHOLE 0x104 #define DF_F0_DRAMBASEADDR 0x110 @@ -704,7 +707,7 @@ int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr) /* Remove HiAddrOffset from normalized address, if enabled: */ if (tmp & BIT(0)) { - u64 hi_addr_offset = (tmp & GENMASK_ULL(31, 20)) << 8; + u64 hi_addr_offset = get_bits(tmp, 31, 20) << 28; /* Check if base 1 is used. */ if (norm_addr >= hi_addr_offset) { @@ -723,10 +726,10 @@ int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr) goto out_err; } - lgcy_mmio_hole_en = tmp & BIT(1); - intlv_num_chan= (tmp >> 4) & 0xF; - intlv_addr_sel= (tmp >> 8) & 0x7; - dram_base_addr= (tmp & GENMASK_ULL(31, 12)) << 16; + lgcy_mmio_hole_en = get_bit(tmp, 1); + intlv_num_chan= get_bits(tmp, 7, 4); + intlv_addr_sel= get_bits(tmp, 10, 8); + dram_base_addr= get_bits(tmp, 31, 12) << 28; /* {0, 1, 2, 3} map to address bits {8, 9, 10, 11} respectively */ if (intlv_addr_sel > 3) { @@ -738,9 +741,9 @@ int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr) if (amd_df_indirect_read(nid, 0, DF_F0_DRAMLIMITADDR + (8 * base), umc, &tmp)) goto out_err; - intlv_num_sockets = (tmp >> 8) & 0x1; - intlv_num_dies= (tmp >> 10) & 0x3; - dram_limit_addr = ((tmp & GENMASK_ULL(31, 12)) << 16) | GENMASK_ULL(27, 0); + intlv_num_sockets = get_bit(tmp, 8); + intlv_num_dies= get_bits(tmp, 11, 10); + dram_limit_addr = (get_bits(tmp, 31, 12) << 28) | GENMASK_ULL(27, 0); intlv_addr_bit = intlv_addr_sel + 8; @@ -793,7 +796,7 @@ int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr) if (amd_df_indirect_read(nid, 0, DF_F0_FABRICINSTINFO3, umc, &tmp)) goto out_err; - cs_fabric_id = (tmp >> 8) & 0xFF; + cs_fabric_id = get_bits(tmp, 15, 8); die_id_bit = 0; /* If interleaved over more than 1 channel: */ @@ -812,16 +815,16 @@ int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr) /* If interleaved over more than 1 die. */ if (intlv_num_dies) { sock_id_bit = die_id_bit + intlv_num_dies; - die_id_shift = (tmp >> 24) & 0xF; - die_id_mask = (tmp >> 8) & 0xFF; + die_id_shift = get_bits(tmp, 27, 24); + die_id_mask = get_bits(tmp, 15, 8); cs_id |= ((cs_fabric_id & die_id_mask) >> die_id_shift) << die_id_bit; } /* If interleaved over more than 1 socket. */ if (intlv_num_sockets) { - socket_id_shift = (tmp >> 28) & 0xF; - socket_id_mask = (tmp >> 16) & 0xFF; + socket_id_shift = get_bits(tmp, 31, 28); + socket_id_mask = get_bits(tmp, 23, 16); cs_id |= ((cs_fabric_id & socket_id_mask) >> socket_id_shift) << sock_id_bit; } @@ -834,7 +837,7 @@ int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr) * bits there are. "intlv_addr_bit" tells us how many "Y" bits * there are (where "I" starts). */ - temp_addr_y = ret_addr & GENMASK_ULL(intlv_addr_bit-1, 0); + temp_addr_y = get_bits(ret_addr, intlv_addr_bit-1, 0); temp_addr_i = (cs_id << intlv_addr_bit); temp_addr_x = (ret_addr & GENMASK_ULL(63, intlv_addr_bit)) << num_intlv
[PATCH v2 8/8] x86/MCE/AMD Support new memory interleaving modes during address translation
From: Muralidhara M K Add support for new memory interleaving modes used in current AMD systems. Check if the system is using a current Data Fabric version or a legacy version as some bit and register definitions have changed. Tested on AMD reference platforms with the following memory interleaving options. Naples - None - Channel - Die - Socket Rome (NPS = Nodes per Socket) - None - NPS0 - NPS1 - NPS2 - NPS4 The fixes tag refers to the commit that allows amd64_edac_mod to load on Rome systems. The module may report an incorrect system addresses on Rome systems depending on the interleaving option used. Fixes: 6e846239e548 ("EDAC/amd64: Add Family 17h Model 30h PCI IDs") Signed-off-by: Muralidhara M K Co-developed-by: Naveen Krishna Chtradhi Signed-off-by: Naveen Krishna Chtradhi Co-developed-by: Yazen Ghannam Signed-off-by: Yazen Ghannam --- Link: https://lkml.kernel.org/r/20200814191449.183998-3-yazen.ghan...@amd.com v1 -> v2: * Rebased on cleanup patches. * Save and use the Data Fabric version. * Reorder code to execute non-legacy flows first. This change wasn't made to the section with the "hashed_bit" calculation, since the current flow reads easier IMHO. arch/x86/kernel/cpu/mce/amd.c | 222 ++ 1 file changed, 172 insertions(+), 50 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c index f5440f8000e9..c14076bcabf2 100644 --- a/arch/x86/kernel/cpu/mce/amd.c +++ b/arch/x86/kernel/cpu/mce/amd.c @@ -683,8 +683,10 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c) #define DF_F0_DRAMBASEADDR 0x110 #define DF_F0_DRAMLIMITADDR0x114 #define DF_F0_DRAMOFFSET 0x1B4 +#define DF_F0_DFGLOBALCTRL 0x3F8 #define DF_F1_SYSFABRICID 0x208 +#define DF_F1_SYSFABRICID1 0x20C int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr) { @@ -695,22 +697,30 @@ int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr) u32 dram_hole_base; u32 reg_dram_base_addr, reg_dram_limit_addr; - u32 reg_dram_offset; + u32 reg_dram_offset, reg_sys_fabric_id; + + bool hash_enabled = false, split_normalized = false; - u8 die_id_shift, die_id_mask, socket_id_shift, socket_id_mask; u8 intlv_num_dies, intlv_num_chan, intlv_num_sockets; - u8 intlv_addr_sel, intlv_addr_bit; - u8 num_intlv_bits, hashed_bit; + u8 intlv_addr_sel, intlv_addr_bit, num_intlv_bits; + u8 cs_mask, cs_id = 0, dst_fabric_id = 0; u8 lgcy_mmio_hole_en, base = 0; - u8 cs_mask, cs_id = 0; - bool hash_enabled = false; + u8 df_version; + + if (amd_df_indirect_read(nid, 1, DF_F1_SYSFABRICID, umc, ®_sys_fabric_id)) + goto out_err; + + df_version = (reg_sys_fabric_id & 0xFF) ? 3 : 2; if (amd_df_indirect_read(nid, 0, DF_F0_DRAMOFFSET, umc, ®_dram_offset)) goto out_err; /* Remove HiAddrOffset from normalized address, if enabled: */ if (reg_dram_offset & BIT(0)) { - u64 hi_addr_offset = get_bits(reg_dram_offset, 31, 20) << 28; + u8 hi_addr_offset_lsb = (df_version >= 3) ? 12 : 20; + u64 hi_addr_offset = get_bits(reg_dram_offset, 31, hi_addr_offset_lsb); + + hi_addr_offset <<= 28; /* Check if base 1 is used. */ if (norm_addr >= hi_addr_offset) { @@ -733,19 +743,23 @@ int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr) goto out_err; lgcy_mmio_hole_en = get_bit(reg_dram_base_addr, 1); - intlv_num_chan= get_bits(reg_dram_base_addr, 7, 4); - intlv_addr_sel= get_bits(reg_dram_base_addr, 10, 8); dram_base_addr= get_bits(reg_dram_base_addr, 31, 12) << 28; - - intlv_num_sockets = get_bit(reg_dram_limit_addr, 8); - intlv_num_dies= get_bits(reg_dram_limit_addr, 11, 10); dram_limit_addr = (get_bits(reg_dram_limit_addr, 31, 12) << 28) | GENMASK_ULL(27, 0); - /* {0, 1, 2, 3} map to address bits {8, 9, 10, 11} respectively */ - if (intlv_addr_sel > 3) { - pr_err("%s: Invalid interleave address select %d.\n", - __func__, intlv_addr_sel); - goto out_err; + if (df_version >= 3) { + intlv_num_chan= get_bits(reg_dram_base_addr, 5, 2); + intlv_num_dies= get_bits(reg_dram_base_addr, 7, 6); + intlv_num_sockets = get_bit(reg_dram_base_addr, 8); + intlv_addr_sel= get_bits(reg_dram_base_addr, 11, 9); + + dst_fabric_id = get_bits(reg_dram_limit_addr, 9, 0); + } else { + intlv_num_chan= get_bits(reg_dram_base_addr, 7, 4); + intlv_addr_sel= get_bits(reg_dram_base_addr, 10, 8); + + dst_fabric_id
[PATCH v2 7/8] x86/MCE/AMD: Group register reads in translation code
From: Yazen Ghannam ...so that bitfield extraction can be done together to simplify future patches. Signed-off-by: Yazen Ghannam --- Link: https://lkml.kernel.org/r/20200814191449.183998-3-yazen.ghan...@amd.com v1 -> v2: * New patch based on comments for v1 Patch 2. arch/x86/kernel/cpu/mce/amd.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c index 5a18937ff7cd..f5440f8000e9 100644 --- a/arch/x86/kernel/cpu/mce/amd.c +++ b/arch/x86/kernel/cpu/mce/amd.c @@ -729,11 +729,18 @@ int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr) goto out_err; } + if (amd_df_indirect_read(nid, 0, DF_F0_DRAMLIMITADDR + (8 * base), umc, ®_dram_limit_addr)) + goto out_err; + lgcy_mmio_hole_en = get_bit(reg_dram_base_addr, 1); intlv_num_chan= get_bits(reg_dram_base_addr, 7, 4); intlv_addr_sel= get_bits(reg_dram_base_addr, 10, 8); dram_base_addr= get_bits(reg_dram_base_addr, 31, 12) << 28; + intlv_num_sockets = get_bit(reg_dram_limit_addr, 8); + intlv_num_dies= get_bits(reg_dram_limit_addr, 11, 10); + dram_limit_addr = (get_bits(reg_dram_limit_addr, 31, 12) << 28) | GENMASK_ULL(27, 0); + /* {0, 1, 2, 3} map to address bits {8, 9, 10, 11} respectively */ if (intlv_addr_sel > 3) { pr_err("%s: Invalid interleave address select %d.\n", @@ -741,13 +748,6 @@ int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr) goto out_err; } - if (amd_df_indirect_read(nid, 0, DF_F0_DRAMLIMITADDR + (8 * base), umc, ®_dram_limit_addr)) - goto out_err; - - intlv_num_sockets = get_bit(reg_dram_limit_addr, 8); - intlv_num_dies= get_bits(reg_dram_limit_addr, 11, 10); - dram_limit_addr = (get_bits(reg_dram_limit_addr, 31, 12) << 28) | GENMASK_ULL(27, 0); - intlv_addr_bit = intlv_addr_sel + 8; /* Re-use intlv_num_chan by setting it equal to log2(#channels) */ -- 2.25.1
[PATCH v2 6/8] x86/MCE/AMD: Drop tmp variable in translation code
From: Yazen Ghannam Remove the "tmp" variable used to save register values. Save the values in existing variables, if possible. The register values are 32 bits. Use separate "reg_" variables to hold the register values if the existing variable sizes doesn't match, or if no bitfields in a register share the same name as the register. Signed-off-by: Yazen Ghannam --- Link: https://lkml.kernel.org/r/20200814191449.183998-3-yazen.ghan...@amd.com v1 -> v2: * New patch based on comments for v1 Patch 2. arch/x86/kernel/cpu/mce/amd.c | 56 +++ 1 file changed, 30 insertions(+), 26 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c index 90c3ad61ae19..5a18937ff7cd 100644 --- a/arch/x86/kernel/cpu/mce/amd.c +++ b/arch/x86/kernel/cpu/mce/amd.c @@ -688,11 +688,14 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c) int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr) { - u64 dram_base_addr, dram_limit_addr, dram_hole_base; /* We start from the normalized address */ u64 ret_addr = norm_addr; - u32 tmp; + u64 dram_base_addr, dram_limit_addr; + u32 dram_hole_base; + + u32 reg_dram_base_addr, reg_dram_limit_addr; + u32 reg_dram_offset; u8 die_id_shift, die_id_mask, socket_id_shift, socket_id_mask; u8 intlv_num_dies, intlv_num_chan, intlv_num_sockets; @@ -702,12 +705,12 @@ int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr) u8 cs_mask, cs_id = 0; bool hash_enabled = false; - if (amd_df_indirect_read(nid, 0, DF_F0_DRAMOFFSET, umc, &tmp)) + if (amd_df_indirect_read(nid, 0, DF_F0_DRAMOFFSET, umc, ®_dram_offset)) goto out_err; /* Remove HiAddrOffset from normalized address, if enabled: */ - if (tmp & BIT(0)) { - u64 hi_addr_offset = get_bits(tmp, 31, 20) << 28; + if (reg_dram_offset & BIT(0)) { + u64 hi_addr_offset = get_bits(reg_dram_offset, 31, 20) << 28; /* Check if base 1 is used. */ if (norm_addr >= hi_addr_offset) { @@ -716,20 +719,20 @@ int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr) } } - if (amd_df_indirect_read(nid, 0, DF_F0_DRAMBASEADDR + (8 * base), umc, &tmp)) + if (amd_df_indirect_read(nid, 0, DF_F0_DRAMBASEADDR + (8 * base), umc, ®_dram_base_addr)) goto out_err; /* Check if address range is valid. */ - if (!(tmp & BIT(0))) { + if (!(reg_dram_base_addr & BIT(0))) { pr_err("%s: Invalid DramBaseAddress range: 0x%x.\n", - __func__, tmp); + __func__, reg_dram_base_addr); goto out_err; } - lgcy_mmio_hole_en = get_bit(tmp, 1); - intlv_num_chan= get_bits(tmp, 7, 4); - intlv_addr_sel= get_bits(tmp, 10, 8); - dram_base_addr= get_bits(tmp, 31, 12) << 28; + lgcy_mmio_hole_en = get_bit(reg_dram_base_addr, 1); + intlv_num_chan= get_bits(reg_dram_base_addr, 7, 4); + intlv_addr_sel= get_bits(reg_dram_base_addr, 10, 8); + dram_base_addr= get_bits(reg_dram_base_addr, 31, 12) << 28; /* {0, 1, 2, 3} map to address bits {8, 9, 10, 11} respectively */ if (intlv_addr_sel > 3) { @@ -738,12 +741,12 @@ int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr) goto out_err; } - if (amd_df_indirect_read(nid, 0, DF_F0_DRAMLIMITADDR + (8 * base), umc, &tmp)) + if (amd_df_indirect_read(nid, 0, DF_F0_DRAMLIMITADDR + (8 * base), umc, ®_dram_limit_addr)) goto out_err; - intlv_num_sockets = get_bit(tmp, 8); - intlv_num_dies= get_bits(tmp, 11, 10); - dram_limit_addr = (get_bits(tmp, 31, 12) << 28) | GENMASK_ULL(27, 0); + intlv_num_sockets = get_bit(reg_dram_limit_addr, 8); + intlv_num_dies= get_bits(reg_dram_limit_addr, 11, 10); + dram_limit_addr = (get_bits(reg_dram_limit_addr, 31, 12) << 28) | GENMASK_ULL(27, 0); intlv_addr_bit = intlv_addr_sel + 8; @@ -786,17 +789,18 @@ int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr) if (num_intlv_bits > 0) { u64 temp_addr_x, temp_addr_i, temp_addr_y; - u8 die_id_bit, sock_id_bit, cs_fabric_id; + u32 reg_sys_fabric_id, cs_fabric_id; + u8 die_id_bit, sock_id_bit; /* * This is the fabric id for this coherent slave. Use * umc/channel# as instance id of the coherent slave * for FICAA. */ - if (amd_df_indirect_read(nid, 0, DF_F0_FABRICINSTINFO3, umc, &tmp)) +
[PATCH v2 2/8] x86/CPU/AMD: Remove amd_get_nb_id()
From: Yazen Ghannam The Last Level Cache ID is returned by amd_get_nb_id(). In practice, this value is the same as the AMD NodeId for callers of this function. The NodeId is saved in struct cpuinfo_x86.node_id. Replace calls to amd_get_nb_id() with the logical CPU's node_id and remove the function. Signed-off-by: Yazen Ghannam --- Link: https://lkml.kernel.org/r/20200814191449.183998-2-yazen.ghan...@amd.com v1 -> v2: * New patch. arch/x86/events/amd/core.c | 2 +- arch/x86/include/asm/processor.h | 2 -- arch/x86/kernel/amd_nb.c | 4 ++-- arch/x86/kernel/cpu/amd.c| 6 -- arch/x86/kernel/cpu/cacheinfo.c | 2 +- arch/x86/kernel/cpu/mce/amd.c| 4 ++-- arch/x86/kernel/cpu/mce/inject.c | 4 ++-- drivers/edac/amd64_edac.c| 4 ++-- drivers/edac/mce_amd.c | 2 +- 9 files changed, 11 insertions(+), 19 deletions(-) diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c index 39eb276d0277..01b9b943dcf4 100644 --- a/arch/x86/events/amd/core.c +++ b/arch/x86/events/amd/core.c @@ -538,7 +538,7 @@ static void amd_pmu_cpu_starting(int cpu) if (!x86_pmu.amd_nb_constraints) return; - nb_id = amd_get_nb_id(cpu); + nb_id = cpu_data(cpu).node_id; WARN_ON_ONCE(nb_id == BAD_APICID); for_each_online_cpu(i) { diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index a776b7886ec0..408977a323d3 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -871,10 +871,8 @@ extern int set_tsc_mode(unsigned int val); DECLARE_PER_CPU(u64, msr_misc_features_shadow); #ifdef CONFIG_CPU_SUP_AMD -extern u16 amd_get_nb_id(int cpu); extern u32 amd_get_nodes_per_socket(void); #else -static inline u16 amd_get_nb_id(int cpu) { return 0; } static inline u32 amd_get_nodes_per_socket(void) { return 0; } #endif diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c index 18f6b7c4bd79..2bd8abdbed8e 100644 --- a/arch/x86/kernel/amd_nb.c +++ b/arch/x86/kernel/amd_nb.c @@ -384,7 +384,7 @@ struct resource *amd_get_mmconfig_range(struct resource *res) int amd_get_subcaches(int cpu) { - struct pci_dev *link = node_to_amd_nb(amd_get_nb_id(cpu))->link; + struct pci_dev *link = node_to_amd_nb(cpu_data(cpu).node_id)->link; unsigned int mask; if (!amd_nb_has_feature(AMD_NB_L3_PARTITIONING)) @@ -398,7 +398,7 @@ int amd_get_subcaches(int cpu) int amd_set_subcaches(int cpu, unsigned long mask) { static unsigned int reset, ban; - struct amd_northbridge *nb = node_to_amd_nb(amd_get_nb_id(cpu)); + struct amd_northbridge *nb = node_to_amd_nb(cpu_data(cpu).node_id); unsigned int reg; int cuid; diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index 5eef4cc1e5b7..846367a69c4a 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -424,12 +424,6 @@ static void amd_detect_ppin(struct cpuinfo_x86 *c) clear_cpu_cap(c, X86_FEATURE_AMD_PPIN); } -u16 amd_get_nb_id(int cpu) -{ - return per_cpu(cpu_llc_id, cpu); -} -EXPORT_SYMBOL_GPL(amd_get_nb_id); - u32 amd_get_nodes_per_socket(void) { return nodes_per_socket; diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c index 81dfddae4470..8e34e90bb872 100644 --- a/arch/x86/kernel/cpu/cacheinfo.c +++ b/arch/x86/kernel/cpu/cacheinfo.c @@ -580,7 +580,7 @@ static void amd_init_l3_cache(struct _cpuid4_info_regs *this_leaf, int index) if (index < 3) return; - node = amd_get_nb_id(smp_processor_id()); + node = cpu_data(smp_processor_id()).node_id; this_leaf->nb = node_to_amd_nb(node); if (this_leaf->nb && !this_leaf->nb->l3_cache.indices) amd_calc_l3_indices(this_leaf->nb); diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c index 0c6b02dd744c..be96f77004ad 100644 --- a/arch/x86/kernel/cpu/mce/amd.c +++ b/arch/x86/kernel/cpu/mce/amd.c @@ -1341,7 +1341,7 @@ static int threshold_create_bank(struct threshold_bank **bp, unsigned int cpu, return -ENODEV; if (is_shared_bank(bank)) { - nb = node_to_amd_nb(amd_get_nb_id(cpu)); + nb = node_to_amd_nb(cpu_data(cpu).node_id); /* threshold descriptor already initialized on this node? */ if (nb && nb->bank4) { @@ -1445,7 +1445,7 @@ static void threshold_remove_bank(struct threshold_bank *bank) * The last CPU on this node using the shared bank is going * away, remove that bank now. */ - nb = node_to_amd_nb(amd_get_nb_id(smp_processor_id())); + nb = node_to_amd_nb(cpu_data(smp_processor_id()).node_id); nb->bank4 = NULL; } diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cp
[PATCH v2 1/8] x86/CPU/AMD: Save NodeId on AMD-based systems
From: Yazen Ghannam AMD systems provide a "NodeId" value that represents a global ID indicating to which "Node" a logical CPU belongs. The "Node" is a physical structure equivalent to a Die, and it should not be confused with logical structures like NUMA node. Logical nodes can be adjusted based on firmware or other settings whereas the physical nodes/dies are fixed based on hardware topology. The NodeId value can be used when a physical ID is needed by software. Save the AMD NodeId to struct cpuinfo_x86. Use the value from CPUID or MSR as appropriate. Default to phys_proc_id otherwise. Do so for both AMD and Hygon systems. Drop the node_id parameter from cacheinfo_*_init_llc_id() as it is no longer needed. Signed-off-by: Yazen Ghannam --- Link: https://lkml.kernel.org/r/20200814191449.183998-2-yazen.ghan...@amd.com v1 -> v2: * New patch based on review comment to save value to struct cpuinfo_x86. arch/x86/include/asm/cacheinfo.h | 4 ++-- arch/x86/include/asm/processor.h | 1 + arch/x86/kernel/cpu/amd.c| 11 +-- arch/x86/kernel/cpu/cacheinfo.c | 6 +++--- arch/x86/kernel/cpu/hygon.c | 11 +-- 5 files changed, 16 insertions(+), 17 deletions(-) diff --git a/arch/x86/include/asm/cacheinfo.h b/arch/x86/include/asm/cacheinfo.h index 86b63c7feab7..86b2e0dcc4bf 100644 --- a/arch/x86/include/asm/cacheinfo.h +++ b/arch/x86/include/asm/cacheinfo.h @@ -2,7 +2,7 @@ #ifndef _ASM_X86_CACHEINFO_H #define _ASM_X86_CACHEINFO_H -void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, int cpu, u8 node_id); -void cacheinfo_hygon_init_llc_id(struct cpuinfo_x86 *c, int cpu, u8 node_id); +void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, int cpu); +void cacheinfo_hygon_init_llc_id(struct cpuinfo_x86 *c, int cpu); #endif /* _ASM_X86_CACHEINFO_H */ diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 97143d87994c..a776b7886ec0 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -95,6 +95,7 @@ struct cpuinfo_x86 { /* CPUID returned core id bits: */ __u8x86_coreid_bits; __u8cu_id; + __u8node_id; /* Max extended CPUID function supported: */ __u32 extended_cpuid_level; /* Maximum supported CPUID level, -1=no CPUID: */ diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index dcc3d943c68f..5eef4cc1e5b7 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -330,7 +330,6 @@ static void legacy_fixup_core_id(struct cpuinfo_x86 *c) */ static void amd_get_topology(struct cpuinfo_x86 *c) { - u8 node_id; int cpu = smp_processor_id(); /* get information required for multi-node processors */ @@ -340,7 +339,7 @@ static void amd_get_topology(struct cpuinfo_x86 *c) cpuid(0x801e, &eax, &ebx, &ecx, &edx); - node_id = ecx & 0xff; + c->node_id = ecx & 0xff; if (c->x86 == 0x15) c->cu_id = ebx & 0xff; @@ -360,15 +359,15 @@ static void amd_get_topology(struct cpuinfo_x86 *c) if (!err) c->x86_coreid_bits = get_count_order(c->x86_max_cores); - cacheinfo_amd_init_llc_id(c, cpu, node_id); + cacheinfo_amd_init_llc_id(c, cpu); } else if (cpu_has(c, X86_FEATURE_NODEID_MSR)) { u64 value; rdmsrl(MSR_FAM10H_NODE_ID, value); - node_id = value & 7; + c->node_id = value & 7; - per_cpu(cpu_llc_id, cpu) = node_id; + per_cpu(cpu_llc_id, cpu) = c->node_id; } else return; @@ -393,7 +392,7 @@ static void amd_detect_cmp(struct cpuinfo_x86 *c) /* Convert the initial APIC ID into the socket ID */ c->phys_proc_id = c->initial_apicid >> bits; /* use socket ID also for last level cache */ - per_cpu(cpu_llc_id, cpu) = c->phys_proc_id; + per_cpu(cpu_llc_id, cpu) = c->node_id = c->phys_proc_id; } static void amd_detect_ppin(struct cpuinfo_x86 *c) diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c index 57074cf3ad7c..81dfddae4470 100644 --- a/arch/x86/kernel/cpu/cacheinfo.c +++ b/arch/x86/kernel/cpu/cacheinfo.c @@ -646,7 +646,7 @@ static int find_num_cache_leaves(struct cpuinfo_x86 *c) return i; } -void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, int cpu, u8 node_id) +void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, int cpu) { /* * We may have multiple LLCs if L3 caches exist, so check if we @@ -657,7 +657,7 @@ void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, int cpu, u8 node_id) if (c->x86 < 0x17) { /* LLC is at the nod
Re: [PATCH] x86/mce: Increase maximum number of banks to 64
On Thu, Aug 20, 2020 at 06:15:15PM +, Luck, Tony wrote: > >> How much does vmlinux size grow with your change? > >> > > > > It seems to get smaller. > > > > -rwxrwxr-x 1 yghannam yghannam 807634088 Aug 20 17:51 vmlinux-32banks > > -rwxrwxr-x 1 yghannam yghannam 807634072 Aug 20 17:50 vmlinux-64banks > > You need to run: > > $ size vmlinux >textdata bss dec hex filename > 203347551256968214798924477033612d7e541 > vmlinux > > Likely the extra space is added to the third element ("bss"). That doesn't > show > up in the vmlinux file, but does add to memory footprint while running. Thanks. Yeah, they're identical: textdata bss dec hex filename 15710076135193065398528 346279102106146 vmlinux-32banks 15710076135193065398528 346279102106146 vmlinux-64banks I did a quick audit of the statically allocated data structures which use MAX_NR_BANKS. Global bitmaps: - core.c / mce_banks_ce_disabled - core.c / all_banks - core.c / valid_banks - core.c / toclear - Total: 32 new bits * 4 bitmaps = 16 new bytes Per-CPU bitmaps: - core.c / mce_poll_banks - intel.c / mce_banks_owned - Total: 32 new bits * 2 bitmaps = 8 new bytes The bitmaps are arrays of longs. So this change will only affect 32-bit execution (I assume), since there will be one additional long used. There will be no additional memory use on 64-bit execution, because the size of long is 64 bits. Global structs: - amd.c / struct smca_bank smca_banks[]: 16 bytes per bank - core.c / struct mce_bank_dev mce_bank_devs[]: 56 bytes per bank - Total: 32 new banks * (16 + 56) bytes = 2304 new bytes Per-CPU structs: - core.c / struct mce_bank mce_banks_array[]: 16 bytes per bank - Total: 32 new banks * 16 bytes = 512 new bytes 32-bit Total global size increase: 2320 bytes Total per-CPU size increase: 520 bytes 64-bit Total global size increase: 2304 bytes Total per-CPU size increase: 512 bytes Is this okay? Thanks, Yazen
Re: [PATCH v2 1/2] cper, apei, mce: Pass x86 CPER through the MCA handling chain
On Fri, Aug 28, 2020 at 03:33:31PM -0500, Smita Koralahalli wrote: ... > +int apei_mce_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 > lapic_id) > +{ > + const u64 *i_mce = ((const void *) (ctx_info + 1)); > + unsigned int cpu; > + struct mce m; > + > + if (!boot_cpu_has(X86_FEATURE_SMCA)) > + return -EINVAL; > + This function is called on any context type, but it can only decode "MSR" types that follow the MCAX register layout used on Scalable MCA systems. So I think there should be a couple of checks added: 1) Context type is "MSR". 2) Register layout follows what is expected below. There's no explict way to do this, since the data is implemenation-specific. But at least there can be a check that the starting MSR address matches the first expected register: Bank's MCA_STATUS in MCAX space (0xC0002XX1). For example: (ctx_info->msr_addr & 0xC0002001) == 0xC0002001 The raw value in the example should be defined with a name. > + mce_setup(&m); > + > + m.extcpu = -1; > + m.socketid = -1; > + > + for_each_possible_cpu(cpu) { > + if (cpu_data(cpu).initial_apicid == lapic_id) { > + m.extcpu = cpu; > + m.socketid = cpu_data(m.extcpu).phys_proc_id; > + break; > + } > + } > + > + m.apicid = lapic_id; > + m.bank = (ctx_info->msr_addr >> 4) & 0xFF; > + m.status = *i_mce; > + m.addr = *(i_mce + 1); > + m.misc = *(i_mce + 2); > + /* Skipping MCA_CONFIG */ > + m.ipid = *(i_mce + 4); > + m.synd = *(i_mce + 5); > + > + mce_log(&m); > + > + return 0; > +} > +EXPORT_SYMBOL_GPL(apei_mce_report_x86_error); > + Thanks, Yazen
Re: [PATCH 1/2] x86/MCE/AMD, EDAC/mce_amd: Use AMD NodeId for Family17h+ DRAM Decode
On Sat, Aug 15, 2020 at 10:42:12AM +0200, Ingo Molnar wrote: > > * Yazen Ghannam wrote: > > > From: Yazen Ghannam > > > > The edac_mce_amd module calls decode_dram_ecc() on AMD Family17h and > > later systems. This function is used in amd64_edac_mod to do > > system-specific decoding for DRAM ECC errors. The function takes a > > "NodeId" as a parameter. > > > > In AMD documentation, NodeId is used to identify a physical die in a > > system. This can be used to identify a node in the AMD_NB code and also > > it is used with umc_normaddr_to_sysaddr(). > > > > However, the input used for decode_dram_ecc() is currently the NUMA node > > of a logical CPU. In the default configuration, the NUMA node and > > physical die will be equivalent, so this doesn't have an impact. But the > > NUMA node configuration can be adjusted with optional memory > > interleaving schemes. This will cause the NUMA node enumeration to not > > match the physical die enumeration. The mismatch will cause the address > > translation function to fail or report incorrect results. > > > > Save the "NodeId" as a percpu value during init in AMD MCE code. Export > > a function to return the value which can be used from modules like > > edac_mce_amd. > > > > Fixes: fbe63acf62f5 ("EDAC, mce_amd: Use cpu_to_node() to find the node ID") > > Signed-off-by: Yazen Ghannam > > --- > > arch/x86/include/asm/mce.h| 2 ++ > > arch/x86/kernel/cpu/mce/amd.c | 11 +++ > > drivers/edac/mce_amd.c| 2 +- > > 3 files changed, 14 insertions(+), 1 deletion(-) > > > > diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h > > index cf503824529c..92527cc9ed06 100644 > > --- a/arch/x86/include/asm/mce.h > > +++ b/arch/x86/include/asm/mce.h > > @@ -343,6 +343,8 @@ extern struct smca_bank smca_banks[MAX_NR_BANKS]; > > extern const char *smca_get_long_name(enum smca_bank_types t); > > extern bool amd_mce_is_memory_error(struct mce *m); > > > > +extern u8 amd_cpu_to_node(unsigned int cpu); > > + > > extern int mce_threshold_create_device(unsigned int cpu); > > extern int mce_threshold_remove_device(unsigned int cpu); > > > > diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c > > index 99be063fcb1b..524edf81e287 100644 > > --- a/arch/x86/kernel/cpu/mce/amd.c > > +++ b/arch/x86/kernel/cpu/mce/amd.c > > @@ -202,6 +202,9 @@ static DEFINE_PER_CPU(unsigned int, bank_map); > > /* Map of banks that have more than MCA_MISC0 available. */ > > static DEFINE_PER_CPU(u32, smca_misc_banks_map); > > > > +/* CPUID_Fn801E_ECX[NodeId] used to identify a physical node/die. */ > > +static DEFINE_PER_CPU(u8, node_id); > > + > > static void amd_threshold_interrupt(void); > > static void amd_deferred_error_interrupt(void); > > > > @@ -233,6 +236,12 @@ static void smca_set_misc_banks_map(unsigned int bank, > > unsigned int cpu) > > > > } > > > > +u8 amd_cpu_to_node(unsigned int cpu) > > +{ > > + return per_cpu(node_id, cpu); > > +} > > +EXPORT_SYMBOL_GPL(amd_cpu_to_node); > > + > > static void smca_configure(unsigned int bank, unsigned int cpu) > > { > > unsigned int i, hwid_mcatype; > > @@ -240,6 +249,8 @@ static void smca_configure(unsigned int bank, unsigned > > int cpu) > > u32 high, low; > > u32 smca_config = MSR_AMD64_SMCA_MCx_CONFIG(bank); > > > > + this_cpu_write(node_id, cpuid_ecx(0x801e) & 0xFF); > > So we already have this magic number used for a similar purpose, in > amd_get_topology(): > > cpuid(0x801e, &eax, &ebx, &ecx, &edx); > > node_id = ecx & 0xff; > Yes, that's right. I did have a patch that tried to leverage the existing topology variables. But it wasn't working for all targeted systems. So I thought to have something local to the AMD MCA code in order to avoid messing with the topology code just for this feature. > Firstly, could we please at least give 0x801e a proper symbolic > name, use it in hygon.c too (which AFAIK is derived from AMD anyway), > and then use it in these new patches? > Sure, but all places that use a symbolic name for a CPUID leaf define it locally. Should the same be done here? Or should there be common place for all the defines like in or maybe a new header file? > Secondly, why not stick node_id into struct cpuinfo_x86, where the MCA > code can then use it without having to introduce a new percpu data > structure? > I think this would be the simplest approach. I can write it. Also, the amd_get_nb_id() function could then be replaced with this. > There's also the underlying assumption that there's only ever going to > be 256 nodes, which limitation I'm sure we'll hear about in a couple > of years as not being quite enough. ;-) > Yeah, CPU topology seems very fractal-like. :) > So less hardcoding and more generalizations please. > Will do. Thanks, Yazen