Em Tue, 17 Sep 2024 13:59:34 +0200 Igor Mammedov <imamm...@redhat.com> escreveu:
> On Sat, 14 Sep 2024 08:13:28 +0200 > Mauro Carvalho Chehab <mchehab+hua...@kernel.org> wrote: > > > The current logic is based on a lot of duct tape, with > > offsets calculated based on one define with the number of > > source IDs and an enum. > > > > Rewrite the logic in a way that it would be more resilient > > of code changes, by moving the source ID count to an enum > > and make the offset calculus more explicit. > > > > Such change was inspired on a patch from Jonathan Cameron > > splitting the logic to get the CPER address on a separate > > function, as this will be needed to support generic error > > injection. > > so this patch switches to using HEST to lookup error status block > by source id, though nothing in commit message mentions that. > Perhaps it's time to rewrite commit message to be more > specific/clear on what it's doing. > > now, I'd split this on several patches that should also take care of > wiring needed to preserve old lookup to keep migration with 9.1 machines > working: > > 1. extract error status block read_ack addresses calculation/lookup > into a separate function (just current code, without HEST lookup). Done. This is at the cleanup series. > 2. since old lookup code handles only 1 source and won't ever see multiple > error sources, simplify that as much as possible to keep old way simple > (and mention that in commit message). > it basically should take 1st error status block/read ack addresses from > hardware_errors Done there too. > > 3. add to AcpiGedState a callback to with signature of #1 lookup func. > (which will be used to switch between old/new code), by default set to > old lookup func > > 4. add to GED a bool property x-has-hardware_errors_addr = 1 > and use it in acpi_ged_realize() to set callback > > if x-has-hardware_errors_addr == 1 > callback = old_lookup Instead of a callback, I opted to define a boolean: DEFINE_PROP_BOOL("x-has-hest-addr", AcpiGedState, ghes_state.hest_lookup, true), That avoided the need of exporting the math logic and deal with callbacks. It also avoided the need of touching acpi_ged_realize(). > > 5. add new HEST lookup func (not used yet with mentioning that follow up > patch > will use it) > > 6. cleanup fwcfg based on x-has-hardware_errors_addr, > i.e. for 'true': > ask for write pointer to hardware_errors like it's done in current > code > and don't register hest_addr write pointer > while for 'false' > do opposite of above. This doesn't work. without the fw_cfg logic for both, QEMU/BIOS won't boot and/or the hardware_errors won't work, causing ghes to do nothing. > 7. flip x-has-hardware_errors_addr default to 0 and add to hw/core/machine.c > hw_compat_9_1[] = { > {ged type, 'x-has-hardware_errors_addr', 'true'}, > } > this will make 9.1 and older machine types to use old lookup callback > while newer machines will use default new lookup I opted to add the changes on machine.c at the same patch. See the RFC series I posted. > that should take care of switching between new and old ABI, > i.e. which code qemu and guest will use/see. With the new series, both virt-9.1 and virt-9.2 are doing the right thing using either the legacy or the new math. I double checked with a small debug patch, and changing the virt-9.1 code to use GPIO, as it is a way easier to inject errors than via SEA (SEA requires the host OS to generate a bus error on a hw address used by the VM). > > > Signed-off-by: Mauro Carvalho Chehab <mchehab+hua...@kernel.org> > > > > --- > > > > Changes from v9: > > - patch split on multiple patches to avoid multiple changes > > at the same patch. > > > > Changes from v8: > > - Non-rename/cleanup changes merged altogether; > > - source ID is now more generic, defined per guest target. > > That should make easier to add support for 86. > > > > Signed-off-by: Mauro Carvalho Chehab <mchehab+hua...@kernel.org> > > --- > > hw/acpi/ghes.c | 95 ++++++++++++++++++++++++++++++------------ > > include/hw/acpi/ghes.h | 6 ++- > > 2 files changed, 73 insertions(+), 28 deletions(-) > > > > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c > > index 36fe5f68782f..6e5f0e8cf0c9 100644 > > --- a/hw/acpi/ghes.c > > +++ b/hw/acpi/ghes.c > > @@ -61,6 +61,23 @@ > > */ > > #define ACPI_GHES_GESB_SIZE 20 > > > > +/* > > + * Offsets with regards to the start of the HEST table stored at > > + * ags->hest_addr_le, according with the memory layout map at > > + * docs/specs/acpi_hest_ghes.rst. > > + */ > > + > > +/* ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2 > > + * Table 18-382 Generic Hardware Error Source version 2 (GHESv2) Structure > > + */ > > +#define HEST_GHES_V2_TABLE_SIZE 92 > > +#define GHES_ACK_OFFSET (64 + GAS_ADDR_OFFSET) > > + > > +/* ACPI 6.2: 18.3.2.7: Generic Hardware Error Source > > + * Table 18-380: 'Error Status Address' field > > + */ > > +#define GHES_ERR_ST_ADDR_OFFSET (20 + GAS_ADDR_OFFSET) > > + > > /* > > * Values for error_severity field > > */ > > @@ -401,11 +418,13 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, > > FWCfgState *s, > > > > int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address) > > { > > - uint64_t error_block_addr, read_ack_register_addr, read_ack_register = > > 0; > > - uint64_t start_addr; > > - bool ret = -1; > > + uint64_t hest_read_ack_start_addr, read_ack_start_addr; > > + uint64_t hest_addr, cper_addr, err_source_struct; > > + uint64_t hest_err_block_addr, error_block_addr; > > + uint32_t i, ret; > > AcpiGedState *acpi_ged_state; > > AcpiGhesState *ags; > > + uint64_t read_ack; > > > > assert(source_id < ACPI_GHES_ERROR_SOURCE_COUNT); > > > > @@ -414,44 +433,66 @@ int acpi_ghes_record_errors(uint8_t source_id, > > uint64_t physical_address) > > g_assert(acpi_ged_state); > > ags = &acpi_ged_state->ghes_state; > > > > - start_addr = le64_to_cpu(ags->ghes_addr_le); > > + hest_addr = le64_to_cpu(ags->hest_addr_le); > > > > if (!physical_address) { > > return -1; > > } > > > > - start_addr += source_id * sizeof(uint64_t); > > + err_source_struct = hest_addr + ACPI_GHES_ERROR_SOURCE_COUNT; > > > > - cpu_physical_memory_read(start_addr, &error_block_addr, > > - sizeof(error_block_addr)); > > + /* > > + * Currently, HEST Error source navigates only for GHESv2 tables > > + */ > > + for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) { > > + uint64_t addr = err_source_struct; > > + uint16_t type, src_id; > > > > - error_block_addr = le64_to_cpu(error_block_addr); > > + cpu_physical_memory_read(addr, &type, sizeof(type)); > > > > - read_ack_register_addr = start_addr + > > - ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t); > > + /* For now, we only know the size of GHESv2 table */ > > + assert(type == ACPI_GHES_SOURCE_GENERIC_ERROR_V2); > > > > - cpu_physical_memory_read(read_ack_register_addr, > > - &read_ack_register, > > sizeof(read_ack_register)); > > + /* It is GHES. Compare CPER source address */ > > + addr += sizeof(type); > > + cpu_physical_memory_read(addr, &src_id, sizeof(src_id)); > > > > - /* zero means OSPM does not acknowledge the error */ > > - if (!read_ack_register) { > > - error_report("OSPM does not acknowledge previous error," > > - " so can not record CPER for current error anymore"); > > - } else if (error_block_addr) { > > - read_ack_register = cpu_to_le64(0); > > - /* > > - * Clear the Read Ack Register, OSPM will write it to 1 when > > - * it acknowledges this error. > > - */ > > - cpu_physical_memory_write(read_ack_register_addr, > > - &read_ack_register, sizeof(uint64_t)); > > + if (src_id == source_id) { > > + break; > > + } > > > > - ret = acpi_ghes_record_mem_error(error_block_addr, > > - physical_address); > > - } else { > > + err_source_struct += HEST_GHES_V2_TABLE_SIZE; > > + } > > + if (i == ACPI_GHES_ERROR_SOURCE_COUNT) { > > error_report("can not find Generic Error Status Block"); > > + > > + return -1; > > } > > > > + /* Navigate though table address pointers */ > > + > > + hest_err_block_addr = err_source_struct + GHES_ERR_ST_ADDR_OFFSET; > > + hest_read_ack_start_addr = err_source_struct + GHES_ACK_OFFSET; > > + > > + cpu_physical_memory_read(hest_err_block_addr, &error_block_addr, > > + sizeof(error_block_addr)); > > + > > + cpu_physical_memory_read(error_block_addr, &cper_addr, > > + sizeof(error_block_addr)); > > + > > + cpu_physical_memory_read(hest_read_ack_start_addr, > > &read_ack_start_addr, > > + sizeof(read_ack_start_addr)); > > + > > + /* Update ACK offset to notify about a new error */ > > + > > + cpu_physical_memory_read(read_ack_start_addr, > > + &read_ack, sizeof(read_ack)); > > + > > + read_ack = cpu_to_le64(0); > > + cpu_physical_memory_write(read_ack_start_addr, > > + &read_ack, sizeof(read_ack)); > > + > > + ret = acpi_ghes_record_mem_error(error_block_addr, physical_address); > > return ret; > > } > > > > diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h > > index c9bbda4740e2..7485f54c28f2 100644 > > --- a/include/hw/acpi/ghes.h > > +++ b/include/hw/acpi/ghes.h > > @@ -23,6 +23,7 @@ > > #define ACPI_GHES_H > > > > #include "hw/acpi/bios-linker-loader.h" > > +#include "qapi/error.h" > > > > /* > > * Values for Hardware Error Notification Type field > > @@ -74,7 +75,10 @@ void acpi_build_hest(GArray *table_data, GArray > > *hardware_errors, > > const char *oem_id, const char *oem_table_id); > > void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s, > > GArray *hardware_errors); > > -int acpi_ghes_record_errors(uint8_t source_id, uint64_t > > error_physical_addr); > > +int acpi_ghes_record_errors(uint8_t source_id, > > + uint64_t error_physical_addr); > > +void ghes_record_cper_errors(const void *cper, size_t len, > > + uint16_t source_id, Error **errp); > > > > /** > > * acpi_ghes_present: Report whether ACPI GHES table is present > Thanks, Mauro