On Wed, 26 Feb 2025 17:14:06 +0100 Mauro Carvalho Chehab <mchehab+hua...@kernel.org> wrote:
> Em Tue, 25 Feb 2025 10:43:27 +0100 > Igor Mammedov <imamm...@redhat.com> escreveu: > > > On Fri, 21 Feb 2025 07:02:21 +0100 > > Mauro Carvalho Chehab <mchehab+hua...@kernel.org> wrote: > > > > > Em Mon, 3 Feb 2025 15:34:23 +0100 > > > Igor Mammedov <imamm...@redhat.com> escreveu: > > > > > > > On Fri, 31 Jan 2025 18:42:44 +0100 > > > > Mauro Carvalho Chehab <mchehab+hua...@kernel.org> wrote: > > > > > > > > > There are two pointers that are needed during error injection: > > > > > > > > > > 1. The start address of the CPER block to be stored; > > > > > 2. The address of the ack. > > > > > > > > > > It is preferable to calculate them from the HEST table. This allows > > > > > checking the source ID, the size of the table and the type of the > > > > > HEST error block structures. > > > > > > > > > > Yet, keep the old code, as this is needed for migration purposes. > > > > > > > > > > Signed-off-by: Mauro Carvalho Chehab <mchehab+hua...@kernel.org> > > > > > --- > > > > > hw/acpi/ghes.c | 132 > > > > > ++++++++++++++++++++++++++++++++++++----- > > > > > include/hw/acpi/ghes.h | 1 + > > > > > 2 files changed, 119 insertions(+), 14 deletions(-) > > > > > > > > > > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c > > > > > index 27478f2d5674..8f284fd191a6 100644 > > > > > --- a/hw/acpi/ghes.c > > > > > +++ b/hw/acpi/ghes.c > > > > > @@ -41,6 +41,12 @@ > > > > > /* Address offset in Generic Address Structure(GAS) */ > > > > > #define GAS_ADDR_OFFSET 4 > > > > > > > > > > +/* > > > > > + * ACPI spec 1.0b > > > > > + * 5.2.3 System Description Table Header > > > > > + */ > > > > > +#define ACPI_DESC_HEADER_OFFSET 36 > > > > > + > > > > > /* > > > > > * The total size of Generic Error Data Entry > > > > > * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data, > > > > > @@ -61,6 +67,25 @@ > > > > > */ > > > > > #define ACPI_GHES_GESB_SIZE 20 > > > > > > > > > > +/* > > > > > + * Offsets with regards to the start of the HEST table stored at > > > > > + * ags->hest_addr_le, > > > > > > > > If I read this literary, then offsets above are not what > > > > declared later in this patch. > > > > I'd really drop this comment altogether as it's confusing, > > > > and rather get variables/macro naming right > > > > > > > > > according with the memory layout map at > > > > > + * docs/specs/acpi_hest_ghes.rst. > > > > > + */ > > > > > > > > what we need is update to above doc, describing new and old ways. > > > > a separate patch. > > > > > > I can't see anything that should be changed at > > > docs/specs/acpi_hest_ghes.rst, as this series doesn't change the > > > firmware layout: we're still using two firmware tables: > > > > > > - etc/acpi/tables, with HEST on it; > > > - etc/hardware_errors, with: > > > - error block addresses; > > > - read_ack registers; > > > - CPER records. > > > > > > The only changes that this series introduce are related to how > > > the error generation logic navigates between HEST and hw_errors > > > firmware. This is not described at acpi_hest_ghes.rst, and both > > > ways follow ACPI specs to the letter. > > > > > > The only difference is that the code which populates the CPER > > > record and the error/read offsets doesn't require to know how > > > the HEST table generation placed offsets, as it will basically > > > reproduce what OSPM firmware does when handling HEST events. > > > > section 8 describes old way to get to address to record old CPER, > > so it needs to amended to also describe a new approach and say > > which way is used for which version. > > > > possibly section 11 might need some messaging as well. > > Ok, I'll modify it and place at the end of the series. Please > see below if the new text is ok for you. > > --- > > [PATCH] docs/specs/acpi_hest_ghes.rst: update it to reflect some changes s/^^^/docs: hest: add new "etc/acpi_table_hest_addr" and update workflow/ > > While the HEST layout didn't change, there are some internal > changes related to how offsets are calculated and how memory error > events are triggered. > > Update specs to reflect such changes. > > Signed-off-by: Mauro Carvalho Chehab <mchehab+hua...@kernel.org> > > diff --git a/docs/specs/acpi_hest_ghes.rst b/docs/specs/acpi_hest_ghes.rst > index c3e9f8d9a702..f22d2eefdec7 100644 > --- a/docs/specs/acpi_hest_ghes.rst > +++ b/docs/specs/acpi_hest_ghes.rst > @@ -89,12 +89,21 @@ Design Details > addresses in the "error_block_address" fields with a pointer to the > respective "Error Status Data Block" in the "etc/hardware_errors" blob. > > -(8) QEMU defines a third and write-only fw_cfg blob which is called > - "etc/hardware_errors_addr". Through that blob, the firmware can send back > - the guest-side allocation addresses to QEMU. The > "etc/hardware_errors_addr" > - blob contains a 8-byte entry. QEMU generates a single WRITE_POINTER > command > - for the firmware. The firmware will write back the start address of > - "etc/hardware_errors" blob to the fw_cfg file "etc/hardware_errors_addr". > +(8) QEMU defines a third and write-only fw_cfg blob to store the location > + where the error block offsets, read ack registers and CPER records are > + stored. > + > + Up to QEMU 9.2, the location was at "etc/hardware_errors_addr", and > + contains an offset for the beginning of "etc/hardware_errors". > + > + Newer versions place the location at "etc/acpi_table_hest_addr", > + pointing to the beginning of the HEST table. > + > + Through that such offsets, the firmware can send back the guest-side ^^^^^^^^^^^^^^^^^^^^^^^^^ can't parse that, suggest to just drop the phrase > + allocation addresses to QEMU. They contain a 8-byte entry. QEMU generates > + a single WRITE_POINTER command for the firmware. The firmware will write > + back the start address of either "etc/hardware_errors" or HEST table at ^^^^ drop this? > + the correspoinding address firmware. > > (9) When QEMU gets a SIGBUS from the kernel, QEMU writes CPER into > corresponding > "Error Status Data Block", guest memory, and then injects platform > specific > @@ -105,8 +114,6 @@ Design Details > kernel, on receiving notification, guest APEI driver could read the > CPER error > and take appropriate action. > > -(11) kvm_arch_on_sigbus_vcpu() uses source_id as index in > "etc/hardware_errors" to > - find out "Error Status Data Block" entry corresponding to error source. > So supported > - source_id values should be assigned here and not be changed afterwards > to make sure > - that guest will write error into expected "Error Status Data Block" > even if guest was > - migrated to a newer QEMU. > +(11) kvm_arch_on_sigbus_vcpu() report RAS errors via a SEA notifications, > + when a SIGBUS event is triggered. > The logic to convert a SEA notification > + into a source ID is defined inside ghes.c source file. that's cheating and not documentation by any means > > >