On Tue, Apr 29, 2025 at 07:21:09PM +0200, Fabio M. De Francesco wrote: > When Firmware First is enabled, BIOS handles errors first and then it > makes them available to the kernel via the Common Platform Error Record > (CPER) sections (UEFI 2.10 Appendix N). Linux parses the CPER sections > via one of two similar paths, either ELOG or GHES. > > Currently, ELOG and GHES show some inconsistencies in how they report to > userspace via trace events. > > Therfore make the two mentioned paths act similarly by tracing the CPER > CXL Protocol Error Section (UEFI v2.10, Appendix N.2.13) signaled by the > I/O Machine Check Architecture and reported by BIOS in FW-First. > > Cc: Dan Williams <dan.j.willi...@intel.com> > Signed-off-by: Fabio M. De Francesco <fabio.m.de.france...@linux.intel.com> > --- > drivers/acpi/acpi_extlog.c | 60 ++++++++++++++++++++++++++++++++++++++ > drivers/cxl/core/ras.c | 6 ++++ > include/cxl/event.h | 2 ++ > 3 files changed, 68 insertions(+) > > diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c > index 7d7a813169f1..8f2ff3505d47 100644 > --- a/drivers/acpi/acpi_extlog.c > +++ b/drivers/acpi/acpi_extlog.c > @@ -12,6 +12,7 @@ > #include <linux/ratelimit.h> > #include <linux/edac.h> > #include <linux/ras.h> > +#include <cxl/event.h> > #include <acpi/ghes.h> > #include <asm/cpu.h> > #include <asm/mce.h> > @@ -157,6 +158,60 @@ static void extlog_print_pcie(struct cper_sec_pcie > *pcie_err, > } > } > > +static void > +extlog_cxl_cper_handle_prot_err(struct cxl_cper_sec_prot_err *prot_err, > + int severity) > +{ > +#ifdef CONFIG_ACPI_APEI_PCIEAER
Why not apply this check on the function prototype? Reference: Documentation/process/coding-style.rst Section 21) Conditional Compilation > + struct cxl_cper_prot_err_work_data wd; > + u8 *dvsec_start, *cap_start; > + > + if (!(prot_err->valid_bits & PROT_ERR_VALID_AGENT_ADDRESS)) { > + pr_err_ratelimited("CXL CPER invalid agent type\n"); > + return; > + } > + > + if (!(prot_err->valid_bits & PROT_ERR_VALID_ERROR_LOG)) { > + pr_err_ratelimited("CXL CPER invalid protocol error log\n"); > + return; > + } > + > + if (prot_err->err_len != sizeof(struct cxl_ras_capability_regs)) { > + pr_err_ratelimited("CXL CPER invalid RAS Cap size (%u)\n", > + prot_err->err_len); > + return; > + } > + > + if (!(prot_err->valid_bits & PROT_ERR_VALID_SERIAL_NUMBER)) > + pr_warn(FW_WARN "CXL CPER no device serial number\n"); Is this a requirement (in the spec) that we should warn users about? The UEFI spec says that serial number is only used if "CXL agent" is a "CXL device". "CXL ports" won't have serial numbers. So this will be a false warning for port errors. > + > + switch (prot_err->agent_type) { > + case RCD: > + case DEVICE: > + case LD: > + case FMLD: > + case RP: > + case DSP: > + case USP: > + memcpy(&wd.prot_err, prot_err, sizeof(wd.prot_err)); > + > + dvsec_start = (u8 *)(prot_err + 1); > + cap_start = dvsec_start + prot_err->dvsec_len; > + > + memcpy(&wd.ras_cap, cap_start, sizeof(wd.ras_cap)); > + wd.severity = cper_severity_to_aer(severity); > + break; > + default: > + pr_err_ratelimited("CXL CPER invalid agent type: %d\n", "invalid" is too harsh given that the specs may be updated. Maybe say "reserved" or "unknown" or "unrecognized" instead. Hopefully things will settle down to where a user will be able to have a system with newer CXL "agents" without *requiring* a kernel update. :) Thanks, Yazen