Re: [RFC PATCH 1/1] hw/arm: FW first ARM processor error injection.
Em Fri, 21 Jun 2024 19:33:16 +0100 Jonathan Cameron escreveu: > On Fri, 21 Jun 2024 17:51:15 +0100 > wrote: > > > From: Shiju Jose > Thanks for posting this. > > Given this is going to linux-edac, probably should mention > this is QEMU based error injection. For cross postings > between kernel related and qemu lists I tend to stick > qemu in the [] of the patch description. Thank you for that! It is really useful. Btw, I'm using a small script to do the error injection using netcat (nc), and assuming that the QMP interface used for error injection will be started at localhost port 4445, e. g. qemu is started with: -qmp tcp:localhost:4445,server=on,wait=off Btw, I added some instructions about how to use it under rasdaemon page: https://github.com/mchehab/rasdaemon/wiki/Error-injection-testing Feel free to improve it. Thanks, Mauro --- #!/bin/bash trap 'catch $LINENO "$BASH_COMMAND"' ERR catch() { echo "Error on line $1: $2" exit 1 } ERROR_DEFAULT='"cache-error"' ERROR="" HELP="$0 [<-c|--cache-error> <-t|--tlb-error> <-b|--bus-error> <-v|--vendor-error>|--micro-arch-error]" while [ "$1" != "" ]; do case "$1" in -c|--cache-error) if [ ! -z "$ERROR" ]; then ERROR="$ERROR, "; fi ERROR+='"cache-error"' ;; -t|--tlb-error) if [ ! -z "$ERROR" ]; then ERROR="$ERROR, "; fi ERROR+='"tlb-error"' ;; -b|--bus-error) if [ ! -z "$ERROR" ]; then ERROR="$ERROR, "; fi ERROR+='"bus-error"' ;; -v|--vendor-error|--micro-arch-error) if [ ! -z "$ERROR" ]; then ERROR="$ERROR, "; fi ERROR+='"micro-arch-error"' ;; help|-h|--help) echo $HELP exit 0 ;; esac shift done if [ -z "$ERROR" ]; then ERROR=$ERROR_DEFAULT fi CACHE_MSG='{ "execute": "qmp_capabilities" } ' CACHE_MSG+='{ "execute": "arm-inject-error", "arguments": { "errortypes": ['$ERROR'] } }' echo $CACHE_MSG echo $CACHE_MSG | nc -v localhost 4445
Re: [PATCH v3 1/7] arm/virt: place power button pin number on a define
Em Mon, 5 Aug 2024 16:04:39 +0200 Igor Mammedov escreveu: > On Thu, 1 Aug 2024 15:15:44 +0200 > Mauro Carvalho Chehab wrote: > > > Em Tue, 30 Jul 2024 13:26:20 +0200 > > Igor Mammedov escreveu: > > > > > On Tue, 30 Jul 2024 09:29:37 +0100 > > > Peter Maydell wrote: > > > > > > > On Tue, 30 Jul 2024 at 08:26, Igor Mammedov > > > > wrote: > > > > > > > > > > On Mon, 22 Jul 2024 08:45:53 +0200 > > > > > Mauro Carvalho Chehab wrote: > > > > > > > > > > > Having magic numbers inside the code is not a good idea, as it > > > > > > is error-prone. So, instead, create a macro with the number > > > > > > definition. > > > > > > > > > > > > Signed-off-by: Mauro Carvalho Chehab > > > > > > Reviewed-by: Jonathan Cameron > > > > > > > > > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > > > > > > index b0c68d66a345..c99c8b1713c6 100644 > > > > > > --- a/hw/arm/virt.c > > > > > > +++ b/hw/arm/virt.c > > > > > > @@ -1004,7 +1004,7 @@ static void virt_powerdown_req(Notifier *n, > > > > > > void *opaque) > > > > > > if (s->acpi_dev) { > > > > > > acpi_send_event(s->acpi_dev, ACPI_POWER_DOWN_STATUS); > > > > > > } else { > > > > > > -/* use gpio Pin 3 for power button event */ > > > > > > +/* use gpio Pin for power button event */ > > > > > > qemu_set_irq(qdev_get_gpio_in(gpio_key_dev, 0), 1); > > > > > > > > > > /me confused, it was saying Pin 3 but is passing 0 as argument where > > > > > as elsewhere > > > > > you are passing 3. Is this a bug? > > > > > > > > No. The gpio_key_dev is a gpio-key device which has one > > > > input (which you assert to "press the key") and one output, > > > > which goes high when the key is pressed and then falls > > > > 100ms later. The virt board wires up the output of the > > > > gpio-key device to input 3 on the PL061 GPIO controller. > > > > (This happens in create_gpio_keys().) So the code is correct > > > > to assert input 0 on the gpio-key device and the comment > > > > isn't wrong that this results in GPIO pin 3 being asserted: > > > > the link is just indirect. > > > > > > it's likely obvious to ARM folks, but maybe comment should > > > clarify above for unaware. > > > > Not sure if a comment here with the pin number is a good idea. > > After all, this patch was originated because we were using > > Pin 6 for GPIO error, while the comment was outdated (stating > > that it was pin 8 instead) :-) > > > > After this series, there will be two GPIO pins used inside arm/virt, > > both defined at arm/virt.h: > > > > /* GPIO pins */ > > #define GPIO_PIN_POWER_BUTTON 3 > > #define GPIO_PIN_GENERIC_ERROR 6 > > > > Those macros are used when GPIOs are created: > > > > static void create_gpio_keys(char *fdt, DeviceState *pl061_dev, > > uint32_t phandle) > > { > > gpio_key_dev = sysbus_create_simple("gpio-key", -1, > > qdev_get_gpio_in(pl061_dev, > > > > GPIO_PIN_POWER_BUTTON)); > > gpio_error_dev = sysbus_create_simple("gpio-key", -1, > > qdev_get_gpio_in(pl061_dev, > > > > GPIO_PIN_GENERIC_ERROR)); > > So, at least for me, it is clear that gpio_key_dev is using pin 3. > > if you switch to using already existing GED device, > then this patch will go away since event will be delivered by GED > instead of GPIO + _AEI. This patch is actually independent from the rest. It is related to a power down event, and not related at all with error inject. The rationale for keeping it on this series was due to the original patch 2 (as otherwise merge conflicts would rise). It can now be merged in separate. Btw, this is doing a cleanup requested by Michael and Peter: https://lore.kernel.org/qemu-devel/CAFEAcA-PYnZ-32MRX+PgvzhnoAV80zBKMYg61j2f=ohagfw...@mail.gmail.com/ Thanks, Mauro
Re: [PATCH v5 2/7] acpi/generic_event_device: add an APEI error device
Em Mon, 5 Aug 2024 17:39:46 +0100 Jonathan Cameron escreveu: > On Fri, 2 Aug 2024 23:43:57 +0200 > Mauro Carvalho Chehab wrote: > > > Adds a Generic Event Device to handle generic hardware error > > events, supporting General Purpose Event (GPE) as specified at > > ACPI 6.5 specification at 18.3.2.7.2: > > https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#event-notification-for-generic-error-sources > > using HID PNP0C33. > > > > The PNP0C33 device is used to report hardware errors to > > the bios via ACPI APEI Generic Hardware Error Source (GHES). > > > > Co-authored-by: Mauro Carvalho Chehab > > Co-authored-by: Jonathan Cameron > > Cc: Jonathan Cameron > > Much nicer with a GED event. > Happy to give SoB on this as you requested due to changes. > > Signed-off-by: Jonathan Cameron > > One minor comment though. > The pnp0c33 device isn't technically coupled to the generic_event_device. > Perhaps that should be in aml_build.h/.c instead of where you > have it here? > > Maybe we can move it later though if anyone implements non GED signalling? I opted to place it there at hw/acpi/generic_event_device.c, just after PNP0C0C, e. g.: void acpi_dsdt_add_power_button(Aml *scope) { Aml *dev = aml_device(ACPI_POWER_BUTTON_DEVICE); aml_append(dev, aml_name_decl("_HID", aml_string("PNP0C0C"))); aml_append(dev, aml_name_decl("_UID", aml_int(0))); aml_append(scope, dev); } void acpi_dsdt_add_error_device(Aml *scope) { Aml *dev = aml_device(ACPI_APEI_ERROR_DEVICE); aml_append(dev, aml_name_decl("_HID", aml_string("PNP0C33"))); aml_append(dev, aml_name_decl("_UID", aml_int(0))); aml_append(dev, aml_name_decl("_STA", aml_int(0xF))); aml_append(scope, dev); } IMO this way it will be kept closer to other PNP devices. If this starts to grow, then some later cleanup could move those to some separate file, but, as now there are just two, I would just keep both there at GED file. > > Jonathan > > > > Signed-off-by: Mauro Carvalho Chehab > > --- > > hw/acpi/generic_event_device.c | 17 + > > include/hw/acpi/acpi_dev_interface.h | 1 + > > include/hw/acpi/generic_event_device.h | 3 +++ > > 3 files changed, 21 insertions(+) > > > > diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c > > index 15b4c3ebbf24..b9ad05e98c05 100644 > > --- a/hw/acpi/generic_event_device.c > > +++ b/hw/acpi/generic_event_device.c > > @@ -26,6 +26,7 @@ static const uint32_t ged_supported_events[] = { > > ACPI_GED_PWR_DOWN_EVT, > > ACPI_GED_NVDIMM_HOTPLUG_EVT, > > ACPI_GED_CPU_HOTPLUG_EVT, > > +ACPI_GED_ERROR_EVT > > }; > > > > /* > > @@ -116,6 +117,11 @@ void build_ged_aml(Aml *table, const char *name, > > HotplugHandler *hotplug_dev, > > aml_notify(aml_name(ACPI_POWER_BUTTON_DEVICE), > >aml_int(0x80))); > > break; > > +case ACPI_GED_ERROR_EVT: > > +aml_append(if_ctx, > > + aml_notify(aml_name(ACPI_APEI_ERROR_DEVICE), > > + aml_int(0x80))); > > +break; > > case ACPI_GED_NVDIMM_HOTPLUG_EVT: > > aml_append(if_ctx, > > aml_notify(aml_name("\\_SB.NVDR"), > > @@ -153,6 +159,15 @@ void acpi_dsdt_add_power_button(Aml *scope) > > aml_append(scope, dev); > > } > > > > +void acpi_dsdt_add_error_device(Aml *scope) > > +{ > > +Aml *dev = aml_device(ACPI_APEI_ERROR_DEVICE); > > +aml_append(dev, aml_name_decl("_HID", aml_string("PNP0C33"))); > > +aml_append(dev, aml_name_decl("_UID", aml_int(0))); > > +aml_append(dev, aml_name_decl("_STA", aml_int(0xF))); > > +aml_append(scope, dev); > > +} > > + > > /* Memory read by the GED _EVT AML dynamic method */ > > static uint64_t ged_evt_read(void *opaque, hwaddr addr, unsigned size) > > { > > @@ -295,6 +310,8 @@ static void acpi_ged_send_event(AcpiDeviceIf *adev, > > AcpiEventStatusBits ev) > > sel = ACPI_GED_MEM_HOTPLUG_EVT; > > } else if (ev & ACPI_POWER_DOWN_STATUS) { > > sel = ACPI_GED_PWR_DOWN_EVT; > > +} else if (ev & ACPI_GENERIC_ERROR) { &
Re: [PATCH v5 3/7] arm/virt: Wire up GPIO error source for ACPI / GHES
Em Mon, 5 Aug 2024 17:54:00 +0100 Jonathan Cameron escreveu: > On Fri, 2 Aug 2024 23:43:58 +0200 > Mauro Carvalho Chehab wrote: > > Do we need to rename this now there is a GED involved? > Is it even technically a GPIO any more? > Spec says in 18.3.2.7 > HW-reduced ACPI platforms signal the error using a GPIO > interrupt or another interrupt declared under > a generic event device (Interrupt-signaled ACPI events) > and goes on to say that a _CRS entry is used to > list the interrupt. > > Give the Generic Event Device has a _CRS > with aml_interrupt() as the type I think we should > even have the hest entry say it's an interrupt (external?) > rather than a gpio. True. I'll change patch description to: arm/virt: Wire up a GED error device for ACPI / GHES Adds support to ARM virtualization to allow handling a General Purpose Event (GPE) via GED error device. It is aligned with Linux Kernel patch: https://lore.kernel.org/lkml/1272350481-27951-8-git-send-email-ying.hu...@intel.com/ As the spec at https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#event-notification-for-generic-error-sources revers to it as: "The implementation of Event notification requires the platform to define a device with PNP ID PNP0C33 in the ACPI namespace, referred to as the error device." > > Adds support to ARM virtualization to allow handling > > a General Purpose Event (GPE) via GED error device. > > > > It is aligned with Linux Kernel patch: > > https://lore.kernel.org/lkml/1272350481-27951-8-git-send-email-ying.hu...@intel.com/ > > > > Co-authored-by: Mauro Carvalho Chehab > > Co-authored-by: Jonathan Cameron > > Cc: Jonathan Cameron > > Again, more or less fine with this > Signed-off-by: Jonathan Cameron > to go with that co-auth Thanks! Mauro
Re: [PATCH v5 4/7] acpi/ghes: Support GPIO error source
Em Mon, 5 Aug 2024 17:56:17 +0100 Jonathan Cameron escreveu: > On Fri, 2 Aug 2024 23:43:59 +0200 > Mauro Carvalho Chehab wrote: > > > From: Jonathan Cameron > > > > Add error notification to GHES v2 using the GPIO source. > > The gpio / external interrupt follows through. True. As session 18.3.2.7 of the spec says: The OSPM evaluates the control method associated with this event as indicated in The Event Method for Handling GPIO Signaled Events and The Event Method for Handling Interrupt Signaled Events. E. g. defining two methods: - GED GPIO; - GED interrupt I'm doing this rename: ACPI_HEST_SRC_ID_GPIO -> ACPI_HEST_SRC_ID_GED_INT To clearly state what it is implemented there. I'm also changing patch description to: acpi/ghes: Add support for General Purpose Event As a GED error device is now defined, add another type of notification. Add error notification to GHES v2 using the GPIO source. [mchehab: do some cleanups at ACPI_HEST_SRC_ID_* checks and rename HEST event to better identify GED interrupt OSPM] Signed-off-by: Jonathan Cameron Signed-off-by: Mauro Carvalho Chehab Regards, Mauro
Re: [PATCH v5 5/7] qapi/ghes-cper: add an interface to do generic CPER error injection
Em Tue, 6 Aug 2024 14:51:53 +0200 Igor Mammedov escreveu: > > +{ 'struct': 'CommonPlatformErrorRecord', > > + 'data': { > > > + 'notification-type': 'str', > > this should be source id (type is just impl. detail of how QEMU delivers > event for given source id) > unless there is no plan to use more sources, > I'd just drop this from API to avoid confusing user. > > Since the patch comes before 5/7, it's not clear how it will be used at this > point. > I'd move the patch after 5/7. As described at: > +# @notification-type: pre-assigned GUID string indicating the record > +# association with an error event notification type, as defined > +# at > https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#record-header This is actually GUID of the error to be generated. Perhaps the better would be to change the above to: { 'struct': 'CommonPlatformErrorRecord', 'data': { 'guid': 'str', 'raw-data': 'str' } Making it even clearer. In any case, this is mandatory, as otherwise the interface would be limited to a single type. Thanks, Mauro
Re: [PATCH v5 4/7] acpi/ghes: Support GPIO error source
Em Tue, 6 Aug 2024 11:32:19 +0200 Igor Mammedov escreveu: > > @@ -327,6 +330,9 @@ static void build_ghes_v2(GArray *table_data, int > > source_id, BIOSLinker *linker) > > */ > > build_ghes_hw_error_notification(table_data, ACPI_GHES_NOTIFY_SEA); > > break; > > +case ACPI_HEST_SRC_ID_GPIO: > > +build_ghes_hw_error_notification(table_data, > > ACPI_GHES_NOTIFY_GPIO); > > perhaps ACPI_GHES_NOTIFY_EXTERNAL fits better here? Symbol already used to map the 12 possible notification types from ACPI spec. I did a: sed s,ACPI_HEST_SRC_ID_GED_INT,ACPI_HEST_NOTIFY_EXTERNAL, instead. Thanks, Mauro
Re: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI
Em Tue, 6 Aug 2024 16:31:13 +0200 Igor Mammedov escreveu: > PS: > looking at the code, ACPI_GHES_MAX_RAW_DATA_LENGTH is 1K > and it is the total size of a error block for a error source. > > However acpi_hest_ghes.rst (3) says it should be 4K, > am I mistaken? Maybe Jonathan knows better, but I guess the 1K was just some arbitrary limit to prevent a too big CPER. The 4K limit described at acpi_hest_ghes.rst could be just some limit to cope with the current bios implementation, but I didn't check myself how this is implemented there. I was unable to find any limit at the specs. Yet, if you look at: https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-section The processor Error Information Structure, starting at offset 40, can go up to 255*32, meaning an offset of 8200, which is bigger than 4K. Going further, processor context can have up to 65535 (spec actually says 65536, but that sounds a typo, as the size is stored on an uint16_t), containing multiple register values there (the spec calls its length as "P"). So, the CPER record could, in theory, have: 8200 + (65535 * P) + sizeof(vendor-specicific-info) The CPER length is stored in Section Length record, which is uint32_t. So, I'd say that the GHES record can theoretically be a lot bigger than 4K. Thanks, Mauro
Re: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI
Em Wed, 7 Aug 2024 10:34:36 +0100 Jonathan Cameron escreveu: > On Wed, 7 Aug 2024 09:47:50 +0200 > Mauro Carvalho Chehab wrote: > > > Em Tue, 6 Aug 2024 16:31:13 +0200 > > Igor Mammedov escreveu: > > > > > PS: > > > looking at the code, ACPI_GHES_MAX_RAW_DATA_LENGTH is 1K > > > and it is the total size of a error block for a error source. > > > > > > However acpi_hest_ghes.rst (3) says it should be 4K, > > > am I mistaken? > > > > Maybe Jonathan knows better, but I guess the 1K was just some > > arbitrary limit to prevent a too big CPER. The 4K limit described > > at acpi_hest_ghes.rst could be just some limit to cope with > > the current bios implementation, but I didn't check myself how > > this is implemented there. > > > > I was unable to find any limit at the specs. Yet, if you look at: > > > > https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-section > > > > I think both limits are just made up. You can in theory log huge > error records. Just not one does. If both are made up, I would sync them, either patching the documentation or the ghes driver. > > > > > The processor Error Information Structure, starting at offset > > 40, can go up to 255*32, meaning an offset of 8200, which is > > bigger than 4K. > > > > Going further, processor context can have up to 65535 (spec > > actually says 65536, but that sounds a typo, as the size is > > stored on an uint16_t), containing multiple register values > > there (the spec calls its length as "P"). > > > > So, the CPER record could, in theory, have: > > 8200 + (65535 * P) + sizeof(vendor-specicific-info) > > > > The CPER length is stored in Section Length record, which is > > uint32_t. > > > > So, I'd say that the GHES record can theoretically be a lot > > bigger than 4K. > Agreed - but I don't think we care for testing as long as it's > big enough for plausible records. Unless you really want > to fuzz the limits? Fuzz the limits could be interesting, but it is not on my current plans. Yet, 1K could be a little bit short for ARM CPER. See: N.26 ARMv8 AArch64 GPRs (Type 4) has 256 bytes for registers, plus 8 bytes for the header. So, a total size of 264 bytes, for a single context register dump. I would expect that, in real life, type 4 to always be reported on aarch64, on BIOS with context register support. Maybe other types could also be dumped altogether (like context registers for EL1, EL2 and/or EL3). If just one type 4 context is encoded, it means that, 1K has space for 23 errors (of a max limit of 255). Just looking at the maximum number, my feeling is that 1K might be too short to simulate some real life reports, but that depends on how firmware is actually grouping such events. So, maybe this could be expanded to, let's say, 4K, thus aligning with the ReST documentation. Regards, Mauro
Re: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI
Em Tue, 6 Aug 2024 16:31:13 +0200 Igor Mammedov escreveu: > > +/* Could also be read back from the error_block_address register */ > > +*error_block_addr = base + > > +ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) + > > +ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) + > > +error_source_to_index[notify] * ACPI_GHES_MAX_RAW_DATA_LENGTH; > > + > > +return true; > > +} > > I don't like all this pointer math, which is basically a reverse engineered > QEMU actions on startup + guest provided etc/hardware_errors address. > > For once, it assumes error_source_to_index[] matches order in which HEST > error sources were described, which is fragile. > > 2nd: migration-wive it's disaster, since old/new HEST/hardware_errors tables > in RAM migrated from older version might not match above assumptions > of target QEMU. > > I see 2 ways to rectify it: > 1st: preferred/cleanest would be to tell QEMU (via fw_cfg) address of HEST > table >in guest RAM, like we do with etc/hardware_errors, see > build_ghes_error_table() >... >tell firmware to write hardware_errors GPA into >and then fetch from HEST table in RAM, the guest patched error/ack > addresses >for given source_id > >code-wise: relatively simple once one wraps their own head over > how this whole APEI thing works in QEMU > workflow is described in docs/specs/acpi_hest_ghes.rst > look to me as sufficient to grasp it. > (but my view is very biased given my prior knowledge, > aka: docs/comments/examples wrt acpi patching are good > enough) > (if it's not clear how to do it, ask me for pointers) That sounds a better approach, however... > 2nd: sort of hack based on build_ghes_v2() Error Status Address/Read Ack > Register > patching instructions >bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE, > >address_offset + GAS_ADDR_OFFSET, sizeof(uint64_t), > >ACPI_GHES_ERRORS_FW_CFG_FILE, source_id * > sizeof(uint64_t)); > ^ > during build_ghes_v2() also store on a side mapping > source_id -> error address offset : read ack address > > so when you are injecting error, you'd at least use offsets > used at start time, to get rid of risk where injection code > diverge from HEST:etc/hardware_errors layout at start time. > > However to make migration safe, one would need to add a fat > comment not to change order ghest error sources in HEST _and_ > a dedicated unit test to make sure we catch it when that happens. > bios_tables_test should be able to catch the change, but it won't > say what's wrong, hence a test case that explicitly checks order > and loudly & clear complains when we will break order assumptions. > > downside: >* we are are limiting ways HEST could be composed/reshuffled in > future >* consumption of extra CI resources >* and well, it relies on above duct tape holding all pieces > together I ended opting to do approach (2) on this changeset, as the current code is already using bios_linker_loader_add_pointer() for ghes, being deeply relying on the block address/ack and cper calculus. To avoid troubles on this duct tape, I opted to move all offset math to a single function at ghes.c: /* * ID numbers used to fill HEST source ID field */ enum AcpiHestSourceId { ACPI_HEST_SRC_ID_SEA, ACPI_HEST_SRC_ID_GED, /* Shall be the last one */ ACPI_HEST_SRC_ID_COUNT } AcpiHestSourceId; ... static bool acpi_hest_address_offset(enum AcpiGhesNotifyType notify, uint64_t *error_block_offset, uint64_t *ack_offset, uint64_t *cper_offset, enum AcpiHestSourceId *source_id) { enum AcpiHestSourceId source; uint64_t offset; switch (notify) { case ACPI_GHES_NOTIFY_SEA: /* Only on ARMv8 */ source = ACPI_HEST_SRC_ID_SEA; break; case ACPI_GHES_NOTIFY_GPIO: source = ACPI_HEST_SRC_ID_GED; break; default: return true; } if (source_id) { *source_id = source; } /* * Please see docs/specs/acpi_hest_ghes.rst for the memory layout. * In summary, memory starts with error addresses, then acks and
[PATCH v6 06/10] acpi/ghes: add support for generic error injection via QAPI
Provide a generic interface for error injection via GHESv2. This patch is co-authored: - original ghes logic to inject a simple ARM record by Shiju Jose; - generic logic to handle block addresses by Jonathan Cameron; - generic GHESv2 error inject by Mauro Carvalho Chehab; Co-authored-by: Jonathan Cameron Co-authored-by: Shiju Jose Co-authored-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Cameron Signed-off-by: Shiju Jose Signed-off-by: Mauro Carvalho Chehab --- hw/acpi/ghes.c | 78 ++ hw/acpi/ghes_cper.c| 2 +- include/hw/acpi/ghes.h | 3 ++ 3 files changed, 82 insertions(+), 1 deletion(-) diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index 26e93dd0f6e2..8525481bb828 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -534,6 +534,84 @@ int acpi_ghes_record_errors(enum AcpiGhesNotifyType notify, NotifierList acpi_generic_error_notifiers = NOTIFIER_LIST_INITIALIZER(error_device_notifiers); +void ghes_record_cper_errors(AcpiGhesCper *cper, Error **errp, + enum AcpiGhesNotifyType notify) +{ +uint64_t cper_addr, read_ack_start_addr; +uint64_t read_ack = 0; +uint32_t data_length; +GArray *block; +uint32_t i; + +if (ghes_get_hardware_errors_address(notify, NULL, &read_ack_start_addr, + &cper_addr, NULL)) { +error_setg(errp, + "GHES: Invalid error block/ack address(es) for notify %d", + notify); +return; +} + +cpu_physical_memory_read(read_ack_start_addr, + &read_ack, sizeof(uint64_t)); + +/* zero means OSPM does not acknowledge the error */ +if (!read_ack) { +error_setg(errp, + "Last CPER record was not acknowledged yet"); +read_ack = 1; +cpu_physical_memory_write(read_ack_start_addr, + &read_ack, sizeof(uint64_t)); +return; +} + +read_ack = cpu_to_le64(0); +cpu_physical_memory_write(read_ack_start_addr, + &read_ack, sizeof(uint64_t)); + +/* Build CPER record */ + +/* + * Invalid fru id: ACPI 4.0: 17.3.2.6.1 Generic Error Data, + * Table 17-13 Generic Error Data Entry + */ +QemuUUID fru_id = {}; + +block = g_array_new(false, true /* clear */, 1); +data_length = ACPI_GHES_DATA_LENGTH + cper->data_len; + +/* + * It should not run out of the preallocated memory if + * adding a new generic error data entry + */ +if ((data_length + ACPI_GHES_GESB_SIZE) > +ACPI_GHES_MAX_RAW_DATA_LENGTH) { +error_setg(errp, "GHES CPER record is too big: %d", + data_length); +} + +/* Build the new generic error status block header */ +acpi_ghes_generic_error_status(block, ACPI_GEBS_UNCORRECTABLE, +0, 0, data_length, +ACPI_CPER_SEV_RECOVERABLE); + +/* Build this new generic error data entry header */ +acpi_ghes_generic_error_data(block, cper->guid, +ACPI_CPER_SEV_RECOVERABLE, 0, 0, +cper->data_len, fru_id, 0); + +/* Add CPER data */ +for (i = 0; i < cper->data_len; i++) { +build_append_int_noprefix(block, cper->data[i], 1); +} + +/* Write the generic error data entry into guest memory */ +cpu_physical_memory_write(cper_addr, block->data, block->len); + +g_array_free(block, true); + +notifier_list_notify(&acpi_generic_error_notifiers, NULL); +} + bool acpi_ghes_present(void) { AcpiGedState *acpi_ged_state; diff --git a/hw/acpi/ghes_cper.c b/hw/acpi/ghes_cper.c index 7aa7e71e90dc..d7ff7debee74 100644 --- a/hw/acpi/ghes_cper.c +++ b/hw/acpi/ghes_cper.c @@ -39,7 +39,7 @@ void qmp_ghes_cper(CommonPlatformErrorRecord *qmp_cper, return; } -/* TODO: call a function at ghes */ +ghes_record_cper_errors(&cper, errp, ACPI_GHES_NOTIFY_GPIO); g_free(cper.data); } diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h index 2fcfa1cc8090..5a7bdb08f8e2 100644 --- a/include/hw/acpi/ghes.h +++ b/include/hw/acpi/ghes.h @@ -79,6 +79,9 @@ typedef struct AcpiGhesCper { size_t data_len; } AcpiGhesCper; +void ghes_record_cper_errors(AcpiGhesCper *cper, Error **errp, + enum AcpiGhesNotifyType notify); + /** * acpi_ghes_present: Report whether ACPI GHES table is present * -- 2.45.2
[PATCH v6 00/10] Add ACPI CPER firmware first error injection on ARM emulation
:0x20 [ 899.194273] {5}[Hardware Error]: access mode: secure [ 899.194544] {5}[Hardware Error]: Error info structure 3: [ 899.194838] {5}[Hardware Error]: num errors: 2 [ 899.195088] {5}[Hardware Error]:error_type: 0x10: micro-architectural error [ 899.195456] {5}[Hardware Error]:error_info: 0x78da03ff [ 899.195782] {5}[Hardware Error]: Error info structure 4: [ 899.196070] {5}[Hardware Error]: num errors: 2 [ 899.196331] {5}[Hardware Error]:error_type: 0x14: TLB error|micro-architectural error [ 899.196733] {5}[Hardware Error]: Context info structure 0: [ 899.197024] {5}[Hardware Error]:register context type: AArch64 EL1 context registers [ 899.197427] {5}[Hardware Error]:: [ 899.197741] {5}[Hardware Error]: Vendor specific error info has 5 bytes: [ 899.198096] {5}[Hardware Error]:: 13 7b 04 05 01 .{... [ 899.198610] [Firmware Warn]: GHES: Unhandled processor error type 0x02: cache error [ 899.199000] [Firmware Warn]: GHES: Unhandled processor error type 0x04: TLB error [ 899.199388] [Firmware Warn]: GHES: Unhandled processor error type 0x08: bus error [ 899.199767] [Firmware Warn]: GHES: Unhandled processor error type 0x10: micro-architectural error [ 899.200194] [Firmware Warn]: GHES: Unhandled processor error type 0x14: TLB error|micro-architectural error --- v6: - PNP0C33 device creation moved to aml-build.c; - acpi_ghes record functions now use ACPI notify parameter, instead of source ID; - the number of source IDs is now automatically calculated; - some code cleanups and function/var renames; - some fixes and cleanups at the error injection script; - ghes cper stub now produces an error if cper JSON is not compiled; - Offset calculation logic for GHES was refactored; - Updated documentation to reflect the GHES allocated size; - Added a x-mpidr object for QOM usage; - Added a patch making usage of x-mpidr field at ARM injection script; v5: - CPER guid is now passing as string; - raw-data is now passed with base64 encode; - Removed several GPIO left-overs from arm/virt.c changes; - Lots of cleanups and improvements at the error injection script. It now better handles QMP dialog and doesn't print debug messages. Also, code was split on two modules, to make easier to add more error injection commands. v4: - CPER generation moved to happen outside QEMU; - One patch adding support for mpidr query was removed. v3: - patch 1 cleanups with some comment changes and adding another place where the poweroff GPIO define should be used. No changes on other patches (except due to conflict resolution). v2: - added a new patch using a define for GPIO power pin; - patch 2 changed to also use a define for generic error GPIO pin; - a couple cleanups at patch 2 removing uneeded else clauses. Jonathan Cameron (1): acpi/ghes: Add support for GED error device Mauro Carvalho Chehab (9): acpi/generic_event_device: add an APEI error device arm/virt: Wire up a GED error device for ACPI / GHES qapi/ghes-cper: add an interface to do generic CPER error injection acpi/ghes: rework the logic to handle HEST source ID acpi/ghes: add support for generic error injection via QAPI docs: acpi_hest_ghes: fix documentation for CPER size scripts/ghes_inject: add a script to generate GHES error inject target/arm: add an experimental mpidr arm cpu property object scripts/arm_processor_error.py: retrieve mpidr if not filled MAINTAINERS| 10 + docs/specs/acpi_hest_ghes.rst | 6 +- hw/acpi/Kconfig| 5 + hw/acpi/aml-build.c| 10 + hw/acpi/generic_event_device.c | 8 + hw/acpi/ghes-stub.c| 3 +- hw/acpi/ghes.c | 308 ++ hw/acpi/ghes_cper.c| 45 +++ hw/acpi/ghes_cper_stub.c | 19 ++ hw/acpi/meson.build| 2 + hw/arm/Kconfig | 5 + hw/arm/virt-acpi-build.c | 1 + hw/arm/virt.c | 12 +- include/hw/acpi/acpi_dev_interface.h | 1 + include/hw/acpi/aml-build.h| 2 + include/hw/acpi/generic_event_device.h | 1 + include/hw/acpi/ghes.h | 24 +- include/hw/arm/virt.h | 1 + qapi/ghes-cper.json| 55 qapi/meson.build | 1 + qapi/qapi-schema.json | 1 + scripts/arm_processor_error.py | 389 ++ scripts/ghes_inject.py | 48 +++ scripts/qmp_helper.py | 431 + target/arm/cpu.c | 1 + target/arm/cpu.h | 1 + target/arm/helper.c| 10 +- 27 files changed, 1316 insertions(+), 84 deletions(-) create mode 100644 hw/acpi/ghes_c
[PATCH v6 10/10] scripts/arm_processor_error.py: retrieve mpidr if not filled
Add support to retrieve mpidr value via qom-get. Signed-off-by: Mauro Carvalho Chehab --- scripts/arm_processor_error.py | 30 ++ 1 file changed, 22 insertions(+), 8 deletions(-) diff --git a/scripts/arm_processor_error.py b/scripts/arm_processor_error.py index b464254c8b7c..756935a2263c 100644 --- a/scripts/arm_processor_error.py +++ b/scripts/arm_processor_error.py @@ -5,12 +5,10 @@ # # Copyright (C) 2024 Mauro Carvalho Chehab -# TODO: current implementation has dummy defaults. -# -# For a better implementation, a QMP addition/call is needed to -# retrieve some data for ARM Processor Error injection: -# -# - ARM registers: power_state, mpidr. +# Note: currently it lacks a method to fill the ARM Processor Error CPER +# psci field from emulation. On a real hardware, this is filled only +# when a CPU is not running. Implementing support for it to simulate a +# real hardware is not trivial. import argparse import re @@ -168,11 +166,27 @@ def send_cper(self, args): else: cper["running-state"] = 0 +if args.mpidr: +cper["mpidr-el1"] = arg["mpidr"] +elif cpus: +get_mpidr = { +"execute": "qom-get", +"arguments": { +'path': cpus[0], +'property': "x-mpidr" +} +} +ret = qmp_cmd.send_cmd(get_mpidr, may_open=True) +if isinstance(ret, int): +cper["mpidr-el1"] = ret +else: +cper["mpidr-el1"] = 0 + if arm_valid_init: if args.affinity: cper["valid"] |= self.arm_valid_bits["affinity"] -if args.mpidr: +if "mpidr-el1" in cper: cper["valid"] |= self.arm_valid_bits["mpidr"] if "running-state" in cper: @@ -360,7 +374,7 @@ def send_cper(self, args): if isinstance(ret, int): arg["midr-el1"] = ret -util.data_add(data, arg.get("mpidr-el1", 0), 8) +util.data_add(data, cper["mpidr-el1"], 8) util.data_add(data, arg.get("midr-el1", 0), 8) util.data_add(data, cper["running-state"], 4) util.data_add(data, arg.get("psci-state", 0), 4) -- 2.45.2
[PATCH v6 05/10] acpi/ghes: rework the logic to handle HEST source ID
The current logic is based on a lot of duct tape, with offsets calculated based on one define with the number of source IDs and an enum. Rewrite the logic in a way that it would be more resilient of code changes, by moving the source ID count to an enum and make the offset calculus more explicit. Such change was inspired on a patch from Jonathan Cameron splitting the logic to get the CPER address on a separate function, as this will be needed to support generic error injection. Signed-off-by: Mauro Carvalho Chehab --- hw/acpi/ghes-stub.c| 3 +- hw/acpi/ghes.c | 225 - include/hw/acpi/ghes.h | 12 +-- 3 files changed, 158 insertions(+), 82 deletions(-) diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c index c315de1802d6..8762449870b5 100644 --- a/hw/acpi/ghes-stub.c +++ b/hw/acpi/ghes-stub.c @@ -11,7 +11,8 @@ #include "qemu/osdep.h" #include "hw/acpi/ghes.h" -int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address) +int acpi_ghes_record_errors(enum AcpiGhesNotifyType notify, +uint64_t physical_address) { return -1; } diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index d6cbeed6e3d5..26e93dd0f6e2 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -34,8 +34,16 @@ /* The max size in bytes for one error block */ #define ACPI_GHES_MAX_RAW_DATA_LENGTH (1 * KiB) -/* Support ARMv8 SEA notification type error source and GPIO interrupt. */ -#define ACPI_GHES_ERROR_SOURCE_COUNT2 +/* + * ID numbers used to fill HEST source ID field + */ +enum AcpiHestSourceId { +ACPI_HEST_SRC_ID_SEA, +ACPI_HEST_SRC_ID_GED, + +/* Shall be the last one */ +ACPI_HEST_SRC_ID_COUNT +} AcpiHestSourceId; /* Generic Hardware Error Source version 2 */ #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2 10 @@ -241,12 +249,12 @@ void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker) int i, error_status_block_offset; /* Build error_block_address */ -for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) { +for (i = 0; i < ACPI_HEST_SRC_ID_COUNT; i++) { build_append_int_noprefix(hardware_errors, 0, sizeof(uint64_t)); } /* Build read_ack_register */ -for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) { +for (i = 0; i < ACPI_HEST_SRC_ID_COUNT; i++) { /* * Initialize the value of read_ack_register to 1, so GHES can be * writable after (re)boot. @@ -261,13 +269,13 @@ void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker) /* Reserve space for Error Status Data Block */ acpi_data_push(hardware_errors, -ACPI_GHES_MAX_RAW_DATA_LENGTH * ACPI_GHES_ERROR_SOURCE_COUNT); +ACPI_GHES_MAX_RAW_DATA_LENGTH * ACPI_HEST_SRC_ID_COUNT); /* Tell guest firmware to place hardware_errors blob into RAM */ bios_linker_loader_alloc(linker, ACPI_GHES_ERRORS_FW_CFG_FILE, hardware_errors, sizeof(uint64_t), false); -for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) { +for (i = 0; i < ACPI_HEST_SRC_ID_COUNT; i++) { /* * Tell firmware to patch error_block_address entries to point to * corresponding "Generic Error Status Block" @@ -286,12 +294,95 @@ void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker) 0, sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE, 0); } +static bool acpi_hest_address_offset(enum AcpiGhesNotifyType notify, + uint64_t *error_block_offset, + uint64_t *ack_offset, + uint64_t *cper_offset, + enum AcpiHestSourceId *source_id) +{ +enum AcpiHestSourceId source; +uint64_t offset; + +switch (notify) { +case ACPI_GHES_NOTIFY_SEA: /* Only on ARMv8 */ +source = ACPI_HEST_SRC_ID_SEA; +break; +case ACPI_GHES_NOTIFY_GPIO: +source = ACPI_HEST_SRC_ID_GED; +break; +default: +return true; +} + +if (source_id) { +*source_id = source; +} + +/* + * Please see docs/specs/acpi_hest_ghes.rst for the memory layout. + * In summary, memory starts with error addresses, then acks and + * finally CPER blocks. + */ + +offset = source * sizeof(uint64_t); + +if (error_block_offset) { +*error_block_offset = offset; +} +if (ack_offset) { +*ack_offset = offset + ACPI_HEST_SRC_ID_COUNT * sizeof(uint64_t); +} +if (cper_offset) { +*cper_offset = 2 * ACPI_HEST_SRC_ID_COUNT * sizeof(uint64_t) + + source * ACPI_GHES_MAX_RAW_DATA_LENGTH; +} + +return false; +} + +static int ghes_get_hardware_errors_address(enum AcpiGhesNotifyType notify, +uint64_t *error_block_addr, +
[PATCH v6 03/10] acpi/ghes: Add support for GED error device
From: Jonathan Cameron As a GED error device is now defined, add another type of notification. Add error notification to GHES v2 using a GED error device GED triggered via interrupt. [mchehab: do some cleanups at ACPI_HEST_SRC_ID_* checks and rename HEST event to better identify GED interrupt OSPM] Signed-off-by: Jonathan Cameron Signed-off-by: Mauro Carvalho Chehab --- hw/acpi/ghes.c | 12 +--- include/hw/acpi/ghes.h | 3 ++- 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index 13b105c5d02d..d6cbeed6e3d5 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -34,8 +34,8 @@ /* The max size in bytes for one error block */ #define ACPI_GHES_MAX_RAW_DATA_LENGTH (1 * KiB) -/* Now only support ARMv8 SEA notification type error source */ -#define ACPI_GHES_ERROR_SOURCE_COUNT1 +/* Support ARMv8 SEA notification type error source and GPIO interrupt. */ +#define ACPI_GHES_ERROR_SOURCE_COUNT2 /* Generic Hardware Error Source version 2 */ #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2 10 @@ -290,6 +290,9 @@ void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker) static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker) { uint64_t address_offset; + +assert(source_id < ACPI_HEST_SRC_ID_RESERVED); + /* * Type: * Generic Hardware Error Source version 2(GHESv2 - Type 10) @@ -327,6 +330,9 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker) */ build_ghes_hw_error_notification(table_data, ACPI_GHES_NOTIFY_SEA); break; +case ACPI_HEST_NOTIFY_EXTERNAL: +build_ghes_hw_error_notification(table_data, ACPI_GHES_NOTIFY_GPIO); +break; default: error_report("Not support this error source"); abort(); @@ -370,6 +376,7 @@ void acpi_build_hest(GArray *table_data, BIOSLinker *linker, /* Error Source Count */ build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4); build_ghes_v2(table_data, ACPI_HEST_SRC_ID_SEA, linker); +build_ghes_v2(table_data, ACPI_HEST_NOTIFY_EXTERNAL, linker); acpi_table_end(linker, &table); } @@ -406,7 +413,6 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address) start_addr = le64_to_cpu(ags->ghes_addr_le); if (physical_address) { - if (source_id < ACPI_HEST_SRC_ID_RESERVED) { start_addr += source_id * sizeof(uint64_t); } diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h index fb80897e7eac..ce6f82a1155a 100644 --- a/include/hw/acpi/ghes.h +++ b/include/hw/acpi/ghes.h @@ -59,9 +59,10 @@ enum AcpiGhesNotifyType { ACPI_GHES_NOTIFY_RESERVED = 12 }; +/* Those are used as table indexes when building GHES tables */ enum { ACPI_HEST_SRC_ID_SEA = 0, -/* future ids go here */ +ACPI_HEST_NOTIFY_EXTERNAL, ACPI_HEST_SRC_ID_RESERVED, }; -- 2.45.2
[PATCH v6 07/10] docs: acpi_hest_ghes: fix documentation for CPER size
While the spec defines a CPER size of 4KiB for each record, currently it is set to 1KiB. Fix the documentation and add a pointer to the macro name there, as this may help to keep it updated. Signed-off-by: Mauro Carvalho Chehab --- docs/specs/acpi_hest_ghes.rst | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/specs/acpi_hest_ghes.rst b/docs/specs/acpi_hest_ghes.rst index 68f1fbe0a4af..c3e9f8d9a702 100644 --- a/docs/specs/acpi_hest_ghes.rst +++ b/docs/specs/acpi_hest_ghes.rst @@ -67,8 +67,10 @@ Design Details (3) The address registers table contains N Error Block Address entries and N Read Ack Register entries. The size for each entry is 8-byte. The Error Status Data Block table contains N Error Status Data Block -entries. The size for each entry is 4096(0x1000) bytes. The total size -for the "etc/hardware_errors" fw_cfg blob is (N * 8 * 2 + N * 4096) bytes. +entries. The size for each entry is defined at the source code as +ACPI_GHES_MAX_RAW_DATA_LENGTH (currently 1024 bytes). The total size +for the "etc/hardware_errors" fw_cfg blob is +(N * 8 * 2 + N * ACPI_GHES_MAX_RAW_DATA_LENGTH) bytes. N is the number of the kinds of hardware error sources. (4) QEMU generates the ACPI linker/loader script for the firmware. The -- 2.45.2
[PATCH v6 09/10] target/arm: add an experimental mpidr arm cpu property object
Accurately injecting an ARM Processor error ACPI/APEI GHES error record requires the value of the ARM Multiprocessor Affinity Register (mpidr). While ARM implements it, this is currently not visible. Add a field at CPU storing it, and place it at arm_cpu_properties as experimental, thus allowing it to be queried via QMP using qom-get function. Signed-off-by: Mauro Carvalho Chehab --- target/arm/cpu.c| 1 + target/arm/cpu.h| 1 + target/arm/helper.c | 10 -- 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/target/arm/cpu.c b/target/arm/cpu.c index 19191c239181..30fcf0a10f46 100644 --- a/target/arm/cpu.c +++ b/target/arm/cpu.c @@ -2619,6 +2619,7 @@ static ObjectClass *arm_cpu_class_by_name(const char *cpu_model) static Property arm_cpu_properties[] = { DEFINE_PROP_UINT64("midr", ARMCPU, midr, 0), +DEFINE_PROP_UINT64("x-mpidr", ARMCPU, mpidr, 0), DEFINE_PROP_UINT64("mp-affinity", ARMCPU, mp_affinity, ARM64_AFFINITY_INVALID), DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID), diff --git a/target/arm/cpu.h b/target/arm/cpu.h index a12859fc5335..d2e86f0877cc 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -1033,6 +1033,7 @@ struct ArchCPU { uint64_t reset_pmcr_el0; } isar; uint64_t midr; +uint64_t mpidr; uint32_t revidr; uint32_t reset_fpsid; uint64_t ctr; diff --git a/target/arm/helper.c b/target/arm/helper.c index 8fb4b474e83f..16e75b7c5ed9 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -4692,7 +4692,7 @@ static uint64_t mpidr_read_val(CPUARMState *env) return mpidr; } -static uint64_t mpidr_read(CPUARMState *env, const ARMCPRegInfo *ri) +static uint64_t mpidr_read(CPUARMState *env) { unsigned int cur_el = arm_current_el(env); @@ -4702,6 +4702,11 @@ static uint64_t mpidr_read(CPUARMState *env, const ARMCPRegInfo *ri) return mpidr_read_val(env); } +static uint64_t mpidr_read_ri(CPUARMState *env, const ARMCPRegInfo *ri) +{ +return mpidr_read(env); +} + static const ARMCPRegInfo lpae_cp_reginfo[] = { /* NOP AMAIR0/1 */ { .name = "AMAIR0", .state = ARM_CP_STATE_BOTH, @@ -9723,7 +9728,7 @@ void register_cp_regs_for_features(ARMCPU *cpu) { .name = "MPIDR_EL1", .state = ARM_CP_STATE_BOTH, .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 5, .fgt = FGT_MPIDR_EL1, - .access = PL1_R, .readfn = mpidr_read, .type = ARM_CP_NO_RAW }, + .access = PL1_R, .readfn = mpidr_read_ri, .type = ARM_CP_NO_RAW }, }; #ifdef CONFIG_USER_ONLY static const ARMCPRegUserSpaceInfo mpidr_user_cp_reginfo[] = { @@ -9733,6 +9738,7 @@ void register_cp_regs_for_features(ARMCPU *cpu) modify_arm_cp_regs(mpidr_cp_reginfo, mpidr_user_cp_reginfo); #endif define_arm_cp_regs(cpu, mpidr_cp_reginfo); +cpu->mpidr = mpidr_read(env); } if (arm_feature(env, ARM_FEATURE_AUXCR)) { -- 2.45.2
[PATCH v6 02/10] arm/virt: Wire up a GED error device for ACPI / GHES
Adds support to ARM virtualization to allow handling generic error ACPI Event via GED & error source device. It is aligned with Linux Kernel patch: https://lore.kernel.org/lkml/1272350481-27951-8-git-send-email-ying.hu...@intel.com/ Co-authored-by: Mauro Carvalho Chehab Co-authored-by: Jonathan Cameron Signed-off-by: Jonathan Cameron Signed-off-by: Mauro Carvalho Chehab --- hw/acpi/ghes.c | 3 +++ hw/arm/virt-acpi-build.c | 1 + hw/arm/virt.c| 12 +++- include/hw/acpi/ghes.h | 3 +++ include/hw/arm/virt.h| 1 + 5 files changed, 19 insertions(+), 1 deletion(-) diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index e9511d9b8f71..13b105c5d02d 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -444,6 +444,9 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address) return ret; } +NotifierList acpi_generic_error_notifiers = +NOTIFIER_LIST_INITIALIZER(error_device_notifiers); + bool acpi_ghes_present(void) { AcpiGedState *acpi_ged_state; diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index f76fb117adff..1769467d23b2 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -858,6 +858,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) } acpi_dsdt_add_power_button(scope); +aml_append(scope, aml_error_device()); #ifdef CONFIG_TPM acpi_dsdt_add_tpm(scope, vms); #endif diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 687fe0bb8bc9..22448e5c5b73 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -677,7 +677,7 @@ static inline DeviceState *create_acpi_ged(VirtMachineState *vms) DeviceState *dev; MachineState *ms = MACHINE(vms); int irq = vms->irqmap[VIRT_ACPI_GED]; -uint32_t event = ACPI_GED_PWR_DOWN_EVT; +uint32_t event = ACPI_GED_PWR_DOWN_EVT | ACPI_GED_ERROR_EVT; if (ms->ram_slots) { event |= ACPI_GED_MEM_HOTPLUG_EVT; @@ -1009,6 +1009,13 @@ static void virt_powerdown_req(Notifier *n, void *opaque) } } +static void virt_generic_error_req(Notifier *n, void *opaque) +{ +VirtMachineState *s = container_of(n, VirtMachineState, generic_error_notifier); + +acpi_send_event(s->acpi_dev, ACPI_GENERIC_ERROR); +} + static void create_gpio_keys(char *fdt, DeviceState *pl061_dev, uint32_t phandle) { @@ -2385,6 +2392,9 @@ static void machvirt_init(MachineState *machine) if (has_ged && aarch64 && firmware_loaded && virt_is_acpi_enabled(vms)) { vms->acpi_dev = create_acpi_ged(vms); +vms->generic_error_notifier.notify = virt_generic_error_req; +notifier_list_add(&acpi_generic_error_notifiers, + &vms->generic_error_notifier); } else { create_gpio_devices(vms, VIRT_GPIO, sysmem); } diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h index 674f6958e905..fb80897e7eac 100644 --- a/include/hw/acpi/ghes.h +++ b/include/hw/acpi/ghes.h @@ -23,6 +23,9 @@ #define ACPI_GHES_H #include "hw/acpi/bios-linker-loader.h" +#include "qemu/notify.h" + +extern NotifierList acpi_generic_error_notifiers; /* * Values for Hardware Error Notification Type field diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h index a4d937ed45ac..ad9f6e94dcc5 100644 --- a/include/hw/arm/virt.h +++ b/include/hw/arm/virt.h @@ -175,6 +175,7 @@ struct VirtMachineState { DeviceState *gic; DeviceState *acpi_dev; Notifier powerdown_notifier; +Notifier generic_error_notifier; PCIBus *bus; char *oem_id; char *oem_table_id; -- 2.45.2
[PATCH v6 08/10] scripts/ghes_inject: add a script to generate GHES error inject
Using the QMP GHESv2 API requires preparing a raw data array containing a CPER record. Add a helper script with subcommands to prepare such data. Currently, only ARM Processor error CPER record is supported. Signed-off-by: Mauro Carvalho Chehab --- MAINTAINERS| 3 + qapi/ghes-cper.json| 4 +- scripts/arm_processor_error.py | 375 scripts/ghes_inject.py | 48 scripts/qmp_helper.py | 431 + 5 files changed, 859 insertions(+), 2 deletions(-) create mode 100644 scripts/arm_processor_error.py create mode 100755 scripts/ghes_inject.py create mode 100644 scripts/qmp_helper.py diff --git a/MAINTAINERS b/MAINTAINERS index a0c36f9b5d0c..9ad336381dbe 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2083,6 +2083,9 @@ S: Maintained F: hw/arm/ghes_cper.c F: hw/acpi/ghes_cper_stub.c F: qapi/ghes-cper.json +F: scripts/ghes_inject.py +F: scripts/arm_processor_error.py +F: scripts/qmp_helper.py ppc4xx L: qemu-...@nongnu.org diff --git a/qapi/ghes-cper.json b/qapi/ghes-cper.json index 3cc4f9f2aaa9..d650996a7150 100644 --- a/qapi/ghes-cper.json +++ b/qapi/ghes-cper.json @@ -36,8 +36,8 @@ ## # @ghes-cper: # -# Inject ARM Processor error with data to be filled according with -# ACPI 6.2 GHESv2 spec. +# Inject a CPER error data to be filled according with ACPI 6.2 +# spec via GHESv2. # # @cper: a single CPER record to be sent to the guest OS. # diff --git a/scripts/arm_processor_error.py b/scripts/arm_processor_error.py new file mode 100644 index ..b464254c8b7c --- /dev/null +++ b/scripts/arm_processor_error.py @@ -0,0 +1,375 @@ +#!/usr/bin/env python3 +# +# pylint: disable=C0301,C0114,R0903,R0912,R0913,R0914,R0915,W0511 +# SPDX-License-Identifier: GPL-2.0 +# +# Copyright (C) 2024 Mauro Carvalho Chehab + +# TODO: current implementation has dummy defaults. +# +# For a better implementation, a QMP addition/call is needed to +# retrieve some data for ARM Processor Error injection: +# +# - ARM registers: power_state, mpidr. + +import argparse +import re + +from qmp_helper import qmp, util, cper_guid + +class ArmProcessorEinj: +""" +Implements ARM Processor Error injection via GHES +""" + +DESC = """ +Generates an ARM processor error CPER, compatible with +UEFI 2.9A Errata. +""" + +ACPI_GHES_ARM_CPER_LENGTH = 40 +ACPI_GHES_ARM_CPER_PEI_LENGTH = 32 + +# Context types +CONTEXT_AARCH32_EL1 = 1 +CONTEXT_AARCH64_EL1 = 5 +CONTEXT_MISC_REG = 8 + +def __init__(self, subparsers): +"""Initialize the error injection class and add subparser""" + +# Valid choice values +self.arm_valid_bits = { +"mpidr":util.bit(0), +"affinity": util.bit(1), +"running": util.bit(2), +"vendor": util.bit(3), +} + +self.pei_flags = { +"first":util.bit(0), +"last": util.bit(1), +"propagated": util.bit(2), +"overflow": util.bit(3), +} + +self.pei_error_types = { +"cache":util.bit(1), +"tlb": util.bit(2), +"bus": util.bit(3), +"micro-arch": util.bit(4), +} + +self.pei_valid_bits = { +"multiple-error": util.bit(0), +"flags":util.bit(1), +"error-info": util.bit(2), +"virt-addr":util.bit(3), +"phy-addr": util.bit(4), +} + +self.data = bytearray() + +parser = subparsers.add_parser("arm", description=self.DESC) + +arm_valid_bits = ",".join(self.arm_valid_bits.keys()) +flags = ",".join(self.pei_flags.keys()) +error_types = ",".join(self.pei_error_types.keys()) +pei_valid_bits = ",".join(self.pei_valid_bits.keys()) + +# UEFI N.16 ARM Validation bits +g_arm = parser.add_argument_group("ARM processor") +g_arm.add_argument("--arm", "--arm-valid", + help=f"ARM valid bits: {arm_valid_bits}") +g_arm.add_argument("-a", "--affinity", "--level", "--affinity-level", + type=lambda x: int(x, 0), + help="Affinity level (when multiple levels apply)") +g_arm.add_argument("-l", "--mpidr", type=lambda x: int(x, 0), + help="Multiprocessor Affinity Register") +g_arm.add_argument("-
[PATCH v6 04/10] qapi/ghes-cper: add an interface to do generic CPER error injection
Creates a QMP command to be used for generic ACPI APEI hardware error injection (HEST) via GHESv2. The actual GHES code will be added at the followup patch. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Shiju Jose Reviewed-by: Jonathan Cameron --- MAINTAINERS | 7 + hw/acpi/Kconfig | 5 hw/acpi/ghes_cper.c | 45 hw/acpi/ghes_cper_stub.c | 19 ++ hw/acpi/meson.build | 2 ++ hw/arm/Kconfig | 5 include/hw/acpi/ghes.h | 7 + qapi/ghes-cper.json | 55 qapi/meson.build | 1 + qapi/qapi-schema.json| 1 + 10 files changed, 147 insertions(+) create mode 100644 hw/acpi/ghes_cper.c create mode 100644 hw/acpi/ghes_cper_stub.c create mode 100644 qapi/ghes-cper.json diff --git a/MAINTAINERS b/MAINTAINERS index 10af21263293..a0c36f9b5d0c 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2077,6 +2077,13 @@ F: hw/acpi/ghes.c F: include/hw/acpi/ghes.h F: docs/specs/acpi_hest_ghes.rst +ACPI/HEST/GHES/ARM processor CPER +R: Mauro Carvalho Chehab +S: Maintained +F: hw/arm/ghes_cper.c +F: hw/acpi/ghes_cper_stub.c +F: qapi/ghes-cper.json + ppc4xx L: qemu-...@nongnu.org S: Orphan diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig index e07d3204eb36..73ffbb82c150 100644 --- a/hw/acpi/Kconfig +++ b/hw/acpi/Kconfig @@ -51,6 +51,11 @@ config ACPI_APEI bool depends on ACPI +config GHES_CPER +bool +depends on ACPI_APEI +default y + config ACPI_PCI bool depends on ACPI && PCI diff --git a/hw/acpi/ghes_cper.c b/hw/acpi/ghes_cper.c new file mode 100644 index ..7aa7e71e90dc --- /dev/null +++ b/hw/acpi/ghes_cper.c @@ -0,0 +1,45 @@ +/* + * ARM Processor error injection + * + * Copyright(C) 2024 Huawei LTD. + * + * This code is licensed under the GPL version 2 or later. See the + * COPYING file in the top-level directory. + * + */ + +#include "qemu/osdep.h" + +#include "qemu/base64.h" +#include "qemu/error-report.h" +#include "qemu/uuid.h" +#include "qapi/qapi-commands-ghes-cper.h" +#include "hw/acpi/ghes.h" + +void qmp_ghes_cper(CommonPlatformErrorRecord *qmp_cper, + Error **errp) +{ +int rc; +AcpiGhesCper cper; +QemuUUID be_uuid, le_uuid; + +rc = qemu_uuid_parse(qmp_cper->notification_type, &be_uuid); +if (rc) { +error_setg(errp, "GHES: Invalid UUID: %s", + qmp_cper->notification_type); +return; +} + +le_uuid = qemu_uuid_bswap(be_uuid); +cper.guid = le_uuid.data; + +cper.data = qbase64_decode(qmp_cper->raw_data, -1, + &cper.data_len, errp); +if (!cper.data) { +return; +} + +/* TODO: call a function at ghes */ + +g_free(cper.data); +} diff --git a/hw/acpi/ghes_cper_stub.c b/hw/acpi/ghes_cper_stub.c new file mode 100644 index ..2358e039b181 --- /dev/null +++ b/hw/acpi/ghes_cper_stub.c @@ -0,0 +1,19 @@ +/* + * ARM Processor error injection + * + * Copyright(C) 2024 Huawei LTD. + * + * This code is licensed under the GPL version 2 or later. See the + * COPYING file in the top-level directory. + * + */ + +#include "qemu/osdep.h" +#include "qapi/error.h" +#include "qapi/qapi-commands-ghes-cper.h" +#include "hw/acpi/ghes.h" + +void qmp_ghes_cper(CommonPlatformErrorRecord *cper, Error **errp) +{ +error_setg(errp, "GHES QMP error inject is not compiled in"); +} diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build index fa5c07db9068..6cbf430eb66d 100644 --- a/hw/acpi/meson.build +++ b/hw/acpi/meson.build @@ -34,4 +34,6 @@ endif system_ss.add(when: 'CONFIG_ACPI', if_false: files('acpi-stub.c', 'aml-build-stub.c', 'ghes-stub.c', 'acpi_interface.c')) system_ss.add(when: 'CONFIG_ACPI_PCI_BRIDGE', if_false: files('pci-bridge-stub.c')) system_ss.add_all(when: 'CONFIG_ACPI', if_true: acpi_ss) +system_ss.add(when: 'CONFIG_GHES_CPER', if_true: files('ghes_cper.c')) +system_ss.add(when: 'CONFIG_GHES_CPER', if_false: files('ghes_cper_stub.c')) system_ss.add(files('acpi-qmp-cmds.c')) diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig index 1ad60da7aa2d..bed6ba27d715 100644 --- a/hw/arm/Kconfig +++ b/hw/arm/Kconfig @@ -712,3 +712,8 @@ config ARMSSE select UNIMP select SSE_COUNTER select SSE_TIMER + +config GHES_CPER +bool +depends on ARM +default y if AARCH64 diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h index ce6f82a1155a..a7a18c7b50cf 100644 --- a/include/hw/acpi/ghes.h +++ b/include/hw/acpi/ghes.h @@ -23,6 +23,7 @@ #define ACPI_GHES_H #include "hw/acpi/bios-linker-loader.h" +#include "qapi/error.h" #include "qemu/no
[PATCH v6 01/10] acpi/generic_event_device: add an APEI error device
Adds a generic error device to handle generic hardware error events as specified at ACPI 6.5 specification at 18.3.2.7.2: https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#event-notification-for-generic-error-sources using HID PNP0C33. The PNP0C33 device is used to report hardware errors to the guest via ACPI APEI Generic Hardware Error Source (GHES). Co-authored-by: Mauro Carvalho Chehab Co-authored-by: Jonathan Cameron Signed-off-by: Jonathan Cameron Signed-off-by: Mauro Carvalho Chehab --- hw/acpi/aml-build.c| 10 ++ hw/acpi/generic_event_device.c | 8 include/hw/acpi/acpi_dev_interface.h | 1 + include/hw/acpi/aml-build.h| 2 ++ include/hw/acpi/generic_event_device.h | 1 + 5 files changed, 22 insertions(+) diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c index 6d4517cfbe3d..cb167523859f 100644 --- a/hw/acpi/aml-build.c +++ b/hw/acpi/aml-build.c @@ -2520,3 +2520,13 @@ Aml *aml_i2c_serial_bus_device(uint16_t address, const char *resource_source) return var; } + +/* ACPI 5.0: 18.3.2.6.2 Event Notification For Generic Error Sources */ +Aml *aml_error_device(void) +{ +Aml *dev = aml_device(ACPI_APEI_ERROR_DEVICE); +aml_append(dev, aml_name_decl("_HID", aml_string("PNP0C33"))); +aml_append(dev, aml_name_decl("_UID", aml_int(0))); + +return dev; +} diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c index 15b4c3ebbf24..1673e9695be3 100644 --- a/hw/acpi/generic_event_device.c +++ b/hw/acpi/generic_event_device.c @@ -26,6 +26,7 @@ static const uint32_t ged_supported_events[] = { ACPI_GED_PWR_DOWN_EVT, ACPI_GED_NVDIMM_HOTPLUG_EVT, ACPI_GED_CPU_HOTPLUG_EVT, +ACPI_GED_ERROR_EVT }; /* @@ -116,6 +117,11 @@ void build_ged_aml(Aml *table, const char *name, HotplugHandler *hotplug_dev, aml_notify(aml_name(ACPI_POWER_BUTTON_DEVICE), aml_int(0x80))); break; +case ACPI_GED_ERROR_EVT: +aml_append(if_ctx, + aml_notify(aml_name(ACPI_APEI_ERROR_DEVICE), + aml_int(0x80))); +break; case ACPI_GED_NVDIMM_HOTPLUG_EVT: aml_append(if_ctx, aml_notify(aml_name("\\_SB.NVDR"), @@ -295,6 +301,8 @@ static void acpi_ged_send_event(AcpiDeviceIf *adev, AcpiEventStatusBits ev) sel = ACPI_GED_MEM_HOTPLUG_EVT; } else if (ev & ACPI_POWER_DOWN_STATUS) { sel = ACPI_GED_PWR_DOWN_EVT; +} else if (ev & ACPI_GENERIC_ERROR) { +sel = ACPI_GED_ERROR_EVT; } else if (ev & ACPI_NVDIMM_HOTPLUG_STATUS) { sel = ACPI_GED_NVDIMM_HOTPLUG_EVT; } else if (ev & ACPI_CPU_HOTPLUG_STATUS) { diff --git a/include/hw/acpi/acpi_dev_interface.h b/include/hw/acpi/acpi_dev_interface.h index 68d9d15f50aa..8294f8f0ccca 100644 --- a/include/hw/acpi/acpi_dev_interface.h +++ b/include/hw/acpi/acpi_dev_interface.h @@ -13,6 +13,7 @@ typedef enum { ACPI_NVDIMM_HOTPLUG_STATUS = 16, ACPI_VMGENID_CHANGE_STATUS = 32, ACPI_POWER_DOWN_STATUS = 64, +ACPI_GENERIC_ERROR = 128, } AcpiEventStatusBits; #define TYPE_ACPI_DEVICE_IF "acpi-device-interface" diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h index a3784155cb33..44d1a6af0c69 100644 --- a/include/hw/acpi/aml-build.h +++ b/include/hw/acpi/aml-build.h @@ -252,6 +252,7 @@ struct CrsRangeSet { /* Consumer/Producer */ #define AML_SERIAL_BUS_FLAG_CONSUME_ONLY(1 << 1) +#define ACPI_APEI_ERROR_DEVICE "GEDD" /** * init_aml_allocator: * @@ -382,6 +383,7 @@ Aml *aml_dma(AmlDmaType typ, AmlDmaBusMaster bm, AmlTransferSize sz, uint8_t channel); Aml *aml_sleep(uint64_t msec); Aml *aml_i2c_serial_bus_device(uint16_t address, const char *resource_source); +Aml *aml_error_device(void); /* Block AML object primitives */ Aml *aml_scope(const char *name_format, ...) G_GNUC_PRINTF(1, 2); diff --git a/include/hw/acpi/generic_event_device.h b/include/hw/acpi/generic_event_device.h index 40af3550b56d..9ace8fe70328 100644 --- a/include/hw/acpi/generic_event_device.h +++ b/include/hw/acpi/generic_event_device.h @@ -98,6 +98,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(AcpiGedState, ACPI_GED) #define ACPI_GED_PWR_DOWN_EVT 0x2 #define ACPI_GED_NVDIMM_HOTPLUG_EVT 0x4 #define ACPI_GED_CPU_HOTPLUG_EVT0x8 +#define ACPI_GED_ERROR_EVT 0x10 typedef struct GEDState { MemoryRegion evt; -- 2.45.2
[PATCH] arm/virt: place power button pin number on a define
Having magic numbers inside the code is not a good idea, as it is error-prone. So, instead, create a macro with the number definition. Link: https://lore.kernel.org/qemu-devel/CAFEAcA-PYnZ-32MRX+PgvzhnoAV80zBKMYg61j2f=ohagfw...@mail.gmail.com/ Signed-off-by: Mauro Carvalho Chehab Suggested-by: Peter Maydell Reviewed-by: Jonathan Cameron Reviewed-by: Igor Mammedov --- hw/arm/virt-acpi-build.c | 6 +++--- hw/arm/virt.c| 7 --- include/hw/arm/virt.h| 3 +++ 3 files changed, 10 insertions(+), 6 deletions(-) diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index e10cad86dd73..f76fb117adff 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -154,10 +154,10 @@ static void acpi_dsdt_add_gpio(Aml *scope, const MemMapEntry *gpio_memmap, aml_append(dev, aml_name_decl("_CRS", crs)); Aml *aei = aml_resource_template(); -/* Pin 3 for power button */ -const uint32_t pin_list[1] = {3}; + +const uint32_t pin = GPIO_PIN_POWER_BUTTON; aml_append(aei, aml_gpio_int(AML_CONSUMER, AML_EDGE, AML_ACTIVE_HIGH, - AML_EXCLUSIVE, AML_PULL_UP, 0, pin_list, 1, + AML_EXCLUSIVE, AML_PULL_UP, 0, &pin, 1, "GPO0", NULL, 0)); aml_append(dev, aml_name_decl("_AEI", aei)); diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 719e83e6a1e7..687fe0bb8bc9 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1004,7 +1004,7 @@ static void virt_powerdown_req(Notifier *n, void *opaque) if (s->acpi_dev) { acpi_send_event(s->acpi_dev, ACPI_POWER_DOWN_STATUS); } else { -/* use gpio Pin 3 for power button event */ +/* use gpio Pin for power button event */ qemu_set_irq(qdev_get_gpio_in(gpio_key_dev, 0), 1); } } @@ -1013,7 +1013,8 @@ static void create_gpio_keys(char *fdt, DeviceState *pl061_dev, uint32_t phandle) { gpio_key_dev = sysbus_create_simple("gpio-key", -1, -qdev_get_gpio_in(pl061_dev, 3)); +qdev_get_gpio_in(pl061_dev, + GPIO_PIN_POWER_BUTTON)); qemu_fdt_add_subnode(fdt, "/gpio-keys"); qemu_fdt_setprop_string(fdt, "/gpio-keys", "compatible", "gpio-keys"); @@ -1024,7 +1025,7 @@ static void create_gpio_keys(char *fdt, DeviceState *pl061_dev, qemu_fdt_setprop_cell(fdt, "/gpio-keys/poweroff", "linux,code", KEY_POWER); qemu_fdt_setprop_cells(fdt, "/gpio-keys/poweroff", - "gpios", phandle, 3, 0); + "gpios", phandle, GPIO_PIN_POWER_BUTTON, 0); } #define SECURE_GPIO_POWEROFF 0 diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h index ab961bb6a9b8..a4d937ed45ac 100644 --- a/include/hw/arm/virt.h +++ b/include/hw/arm/virt.h @@ -47,6 +47,9 @@ /* See Linux kernel arch/arm64/include/asm/pvclock-abi.h */ #define PVTIME_SIZE_PER_CPU 64 +/* GPIO pins */ +#define GPIO_PIN_POWER_BUTTON 3 + enum { VIRT_FLASH, VIRT_MEM, -- 2.45.2
Re: [PATCH v6 00/10] Add ACPI CPER firmware first error injection on ARM emulation
Em Thu, 8 Aug 2024 14:26:26 +0200 Mauro Carvalho Chehab escreveu: > v6: > - PNP0C33 device creation moved to aml-build.c; > - acpi_ghes record functions now use ACPI notify parameter, > instead of source ID; > - the number of source IDs is now automatically calculated; > - some code cleanups and function/var renames; > - some fixes and cleanups at the error injection script; > - ghes cper stub now produces an error if cper JSON is not compiled; > - Offset calculation logic for GHES was refactored; > - Updated documentation to reflect the GHES allocated size; > - Added a x-mpidr object for QOM usage; > - Added a patch making usage of x-mpidr field at ARM injection > script; Forgot to mention: I dropped the PIN cleanup from this series, submitting it in separate - and it is not related anymore with this changeset: https://lore.kernel.org/qemu-devel/ef0e7f5fca6cd94eda415ecee670c3028c671b74.1723121692.git.mchehab+hua...@kernel.org/T/#u Thanks, Mauro
Re: [PATCH v5 5/7] qapi/ghes-cper: add an interface to do generic CPER error injection
Em Thu, 08 Aug 2024 10:50:33 +0200 Markus Armbruster escreveu: > Mauro Carvalho Chehab writes: > > diff --git a/MAINTAINERS b/MAINTAINERS > > index 98eddf7ae155..655edcb6688c 100644 > > --- a/MAINTAINERS > > +++ b/MAINTAINERS > > @@ -2075,6 +2075,13 @@ F: hw/acpi/ghes.c > > F: include/hw/acpi/ghes.h > > F: docs/specs/acpi_hest_ghes.rst > > > > +ACPI/HEST/GHES/ARM processor CPER > > +R: Mauro Carvalho Chehab > > +S: Maintained > > +F: hw/arm/ghes_cper.c > > +F: hw/acpi/ghes_cper_stub.c > > +F: qapi/ghes-cper.json > > + > > Here's the reason for creating a new QAPI module instead of adding to > existing module acpi.json: different maintainers. > > Hypothetical question: if we didn't care for that, would this go into > qapi/acpi.json? Independently of maintainers, GHES is part of ACPI APEI HEST, meaning to report hardware errors. Such hardware errors are typically handled by the host OS, so quest doesn't need to be aware of that[1]. So, IMO the best would be to keep APEI/HEST/GHES in a separate file. [1] still, I can foresee some scenarios were passing some errors to the guest could make sense. > > If yes, then should we call it acpi-ghes-cper.json or acpi-ghes.json > instead? Naming it as acpi-ghes,acpi-hest or acpi-ghes-cper would equally work from my side. > > > ppc4xx > > L: qemu-...@nongnu.org > > S: Orphan > > [...] > > > diff --git a/qapi/ghes-cper.json b/qapi/ghes-cper.json > > new file mode 100644 > > index ..3cc4f9f2aaa9 > > --- /dev/null > > +++ b/qapi/ghes-cper.json > > @@ -0,0 +1,55 @@ > > +# -*- Mode: Python -*- > > +# vim: filetype=python > > + > > +## > > +# = GHESv2 CPER Error Injection > > +# > > +# These are defined at > > +# ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2 > > +# (GHESv2 - Type 10) > > +## > > Feels a bit terse. These what? > > The reference could be clearer: "defined in the ACPI Specification 6.2, > section 18.3.2.8 Generic Hardware Error Source version 2". A link would > be nice, if it's stable. I can add a link, but only newer ACPI versions are hosted in html format (e. g. only versions 6.4 and 6.5 are available as html at uefi.org). Can I place something like: Defined since ACPI Specification 6.2, section 18.3.2.8 Generic Hardware Error Source version 2. See: https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#generic-hardware-error-source-version-2-ghesv2-type-10 e. g. having the link pointing to ACPI 6.4 or 6.5, instead of 6.2? > # @raw-data: payload of the CPER encoded in base64 > > Have you considered naming this @payload instead? Works for me. Thanks, Mauro
Re: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI
Em Thu, 8 Aug 2024 10:11:07 +0200 Igor Mammedov escreveu: > On Wed, 7 Aug 2024 15:25:47 +0100 > Jonathan Cameron wrote: > > > On Tue, 6 Aug 2024 16:31:13 +0200 > > Igor Mammedov wrote: > > > > > On Fri, 2 Aug 2024 23:44:01 +0200 > > > Mauro Carvalho Chehab wrote: > > > > > > > Provide a generic interface for error injection via GHESv2. > > > > > > > > This patch is co-authored: > > > > - original ghes logic to inject a simple ARM record by Shiju Jose; > > > > - generic logic to handle block addresses by Jonathan Cameron; > > > > - generic GHESv2 error inject by Mauro Carvalho Chehab; > > > > > > > > Co-authored-by: Jonathan Cameron > > > > Co-authored-by: Shiju Jose > > > > Co-authored-by: Mauro Carvalho Chehab > > > > Cc: Jonathan Cameron > > > > Cc: Shiju Jose > > > > Signed-off-by: Mauro Carvalho Chehab > > > > --- > > > > hw/acpi/ghes.c | 159 ++--- > > > > hw/acpi/ghes_cper.c| 2 +- > > > > include/hw/acpi/ghes.h | 3 + > > > > 3 files changed, 152 insertions(+), 12 deletions(-) > > > > > > > > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c > > > > index a745dcc7be5e..e125c9475773 100644 > > > > --- a/hw/acpi/ghes.c > > > > +++ b/hw/acpi/ghes.c > > > > @@ -395,23 +395,22 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, > > > > FWCfgState *s, > > > > ags->present = true; > > > > } > > > > > > > > +static uint64_t ghes_get_state_start_address(void) > > > > > > ghes_get_hardware_errors_address() might better reflect what address it > > > will return > > > > > > > +{ > > > > +AcpiGedState *acpi_ged_state = > > > > +ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED, NULL)); > > > > +AcpiGhesState *ags = &acpi_ged_state->ghes_state; > > > > + > > > > +return le64_to_cpu(ags->ghes_addr_le); > > > > +} > > > > + > > > > int acpi_ghes_record_errors(uint8_t source_id, uint64_t > > > > physical_address) > > > > { > > > > uint64_t error_block_addr, read_ack_register_addr, > > > > read_ack_register = 0; > > > > -uint64_t start_addr; > > > > +uint64_t start_addr = ghes_get_state_start_address(); > > > > bool ret = -1; > > > > -AcpiGedState *acpi_ged_state; > > > > -AcpiGhesState *ags; > > > > - > > > > assert(source_id < ACPI_HEST_SRC_ID_RESERVED); > > > > > > > > -acpi_ged_state = ACPI_GED(object_resolve_path_type("", > > > > TYPE_ACPI_GED, > > > > - NULL)); > > > > -g_assert(acpi_ged_state); > > > > -ags = &acpi_ged_state->ghes_state; > > > > - > > > > -start_addr = le64_to_cpu(ags->ghes_addr_le); > > > > - > > > > if (physical_address) { > > > > start_addr += source_id * sizeof(uint64_t); > > > > > > above should be a separate patch > > > > > > > > > > > @@ -448,9 +447,147 @@ int acpi_ghes_record_errors(uint8_t source_id, > > > > uint64_t physical_address) > > > > return ret; > > > > } > > > > > > > > +/* > > > > + * Error register block data layout > > > > + * > > > > + * | +-+ ges.ghes_addr_le > > > > + * | |error_block_address0 | > > > > + * | +-+ > > > > + * | |error_block_address1 | > > > > + * | +-+ --+-- > > > > + * | |.| GHES_ADDRESS_SIZE > > > > + * | +-+ --+-- > > > > + * | |error_block_addressN | > > > > + * | +-+ > > > > + * | | read_ack0 | > > > > + * | +-+ --+-- > > > > + * | | read_ack1 | GHES_ADDRESS_SIZE > > > > + * | +-+ --+-- > > > > + * | | . | > > > > + * | +-+ > > > > + * | | read_ackN | > > > > + * | +--
Re: [PATCH v5 7/7] scripts/ghes_inject: add a script to generate GHES error inject
Em Thu, 8 Aug 2024 16:58:38 -0400 John Snow escreveu: > On Fri, Aug 2, 2024 at 5:44 PM Mauro Carvalho Chehab < > mchehab+hua...@kernel.org> wrote: > > > +#!/usr/bin/env python3 > > +# > > +# pylint: disable=C0301, C0114, R0912, R0913, R0914, R0915, W0511 > > > > Out of curiosity, what tools are you using to delint your files Primarily I use pylint, almost always with disable line(s), as those lint tools have some warnings that sound too silly (like too many/too low functions/branches/arguments...). From time to time, I review the disable lines, to keep the code as clean as desired. Sometimes I also use pep8 (now named as pycodestyle) and black, specially when I want some autoformat hints (I manually commit the hunks that make sense), but I prefer pylint as the primary checking tool. I'm not too found of the black's coding style, though[1]. [1] For instance, black would do this change: -g_arm.add_argument("--arm", "--arm-valid", - help=f"ARM valid bits: {arm_valid_bits}") +g_arm.add_argument( +"--arm", "--arm-valid", help=f"ARM valid bits: {arm_valid_bits}" +) IMO, the original coding style I wrote is a lot better than black's suggestion - and it is closer to the C style I use at the Linux Kernel ;-) > and how are > you invoking them? I don't play much with such tools, though. I usually just invoke them with the python file names(s) without passing any parameters nor creating any configuration file. > I don't really maintain any strict regime for python files under > qemu.git/scripts (yet), so I am mostly curious as to what regimes others > are using currently. I don't see most QEMU contributors checking in pylint > ignores etc directly into the files, so it caught my eye. Having some verification sounds interesting, as it may help preventing some hidden bugs (like re-defining a variable that it was already used globally), if such check is not too picky and if stupid warnings can be bypassed. Regards, Mauro
Re: [PATCH v5 7/7] scripts/ghes_inject: add a script to generate GHES error inject
Em Thu, 8 Aug 2024 17:21:33 -0400 John Snow escreveu: > On Fri, Aug 2, 2024 at 5:44 PM Mauro Carvalho Chehab < > mchehab+hua...@kernel.org> wrote: > > > diff --git a/scripts/qmp_helper.py b/scripts/qmp_helper.py > > new file mode 100644 > > index ..13fae7a7af0e > > --- /dev/null > > +++ b/scripts/qmp_helper.py > > > > I'm going to admit I only glanced at this very briefly, but -- is there a > chance you could use qemu.git/python/qemu/qmp instead of writing your own > helpers here? > > If *NOT*, is there something that I need to add to our QMP library to > facilitate your script? I started writing this script to be hosted outside qemu tree, when we had a very different API. I noticed later about the QMP, and even tried to write a patch for it, but I gave up due to asyncio complexity... Please notice that, on this file, I actually placed three classes: - qmp - util - cper_guid I could probably make the first one to be an override of QEMUMonitorProtocol (besides normal open/close/cmd communication, it also contains some methods that are specific to error inject use case: - to generate a CPER record; - to search for data via qom-get. The other two classes are just common code used by ghes_inject commands. My idea is to have multiple commands to do different kinds of GHES error injection, each command on a different file/class. > > +s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) > > +try: > > +s.connect((host, port)) > > +except ConnectionRefusedError: > > +sys.exit(f"Can't connect to QMP host {host}:{port}") > > > > You should be able to use e.g. > > legacy.py's QEMUMonitorProtocol class for synchronous connections, e.g. > > from qemu.qmp.legacy import QEMUMonitorProtocol > > qmp = QEMUMonitorProtocol((host, port)) > qmp.connect(negotiate=True) That sounds interesting! I give it a try. > If you want to run the script w/o setting up a virtual environment or > installing the package, take a look at the hacks in scripts/qmp/ for how I > support e.g. qom-get directly from the source tree. Yeah, I saw that already. Doing: sys.path.append(path.join(qemu_dir, 'python')) the same way qom-get does should do the trick. > > + > > +data = s.recv(1024) > > +try: > > +obj = json.loads(data.decode("utf-8")) > > +except json.JSONDecodeError as e: > > +print(f"Invalid QMP answer: {e}") > > +s.close() > > +return > > + > > +if "QMP" not in obj: > > +print(f"Invalid QMP answer: {data.decode("utf-8")}") > > +s.close() > > +return > > + > > +for i, command in enumerate(commands): > > > > Then here you'd use qmp.cmd (raises exception on QMPError) or qmp.cmd_raw > or qmp.cmd_obj (returns the QMP response as the return value even if it was > an error.) Good to know, I'll try and see what fits best. > More details: > https://qemu.readthedocs.io/projects/python-qemu-qmp/en/latest/qemu.qmp.legacy.html I'll take a look. The name "legacy" is a little scary, as it might imply that this has been deprecated. If there's no plans to deprecate, then it would be great to use it and simplify the code a little bit. > There's also an async version, but it doesn't look like you require that > complexity, so you can ignore it. Yes, that's the case: a serialized sync send/response logic works perfectly for this script. No need to be burden with asyncio complexity. Thanks, Mauro
Re: [PATCH v5 7/7] scripts/ghes_inject: add a script to generate GHES error inject
Em Fri, 9 Aug 2024 00:41:37 +0200 Mauro Carvalho Chehab escreveu: > > You should be able to use e.g. > > > > legacy.py's QEMUMonitorProtocol class for synchronous connections, e.g. > > > > from qemu.qmp.legacy import QEMUMonitorProtocol > > > > qmp = QEMUMonitorProtocol((host, port)) > > qmp.connect(negotiate=True) > > That sounds interesting! I give it a try. I applied the enclosed patch at the end of my patch series, but somehow it is not working. For whatever reason, connect() is raising a StateError apparently due to Runstate.CONNECTING. I tried both as declaring (see enclosed patch): class qmp(QEMUMonitorProtocol) and using: -super().__init__(self.host, self.port) +self.qmp_monitor = QEMUMonitorProtocol(self.host, self.port) On both cases, it keeps waiting forever for a connection. Regards, Mauro --- diff --git a/scripts/qmp_helper.py b/scripts/qmp_helper.py index e9e9388bcb8b..62ca267cdc87 100644 --- a/scripts/qmp_helper.py +++ b/scripts/qmp_helper.py @@ -9,9 +9,23 @@ import socket import sys +from os import path + +try: +qemu_dir = path.abspath(path.dirname(path.dirname(__file__))) +sys.path.append(path.join(qemu_dir, 'python')) + +from qemu.qmp.legacy import QEMUMonitorProtocol +from qemu.qmp.protocol import StateError + +except ModuleNotFoundError as exc: +print(f"Module '{exc.name}' not found.") +print("Try export PYTHONPATH=top-qemu-dir/python or run from top-qemu-dir") +sys.exit(1) + from base64 import b64encode -class qmp: +class qmp(QEMUMonitorProtocol): """ Opens a connection and send/receive QMP commands. """ @@ -21,22 +35,20 @@ def send_cmd(self, command, may_open=False,return_error=True): if may_open: self._connect() -elif not self.socket: -return None +elif not self.connected: +return False if isinstance(command, dict): data = json.dumps(command).encode("utf-8") else: data = command.encode("utf-8") -self.socket.sendall(data) -data = self.socket.recv(1024) try: -obj = json.loads(data.decode("utf-8")) -except json.JSONDecodeError as e: -print(f"Invalid QMP answer: {e}") -self._close() -return None +obj = self.cmd_obj(command) +except Exception as e: +print("Failed to inject error: {e}.") + +print(obj) if "return" in obj: if isinstance(obj.get("return"), dict): @@ -46,86 +58,47 @@ def send_cmd(self, command, may_open=False,return_error=True): else: return obj["return"] -elif isinstance(obj.get("error"), dict): -error = obj["error"] -if return_error: -print(f'{error["class"]}: {error["desc"]}') -else: -print(json.dumps(obj)) - return None def _close(self): """Shutdown and close the socket, if opened""" -if not self.socket: +if not self.connected: return -self.socket.shutdown(socket.SHUT_WR) -while 1: -data = self.socket.recv(1024) -if data == b"": -break -try: -obj = json.loads(data.decode("utf-8")) -except json.JSONDecodeError as e: -print(f"Invalid QMP answer: {e}") -self.socket.close() -self.socket = None -return - -if isinstance(obj.get("return"), dict): -print(json.dumps(obj["return"])) -if isinstance(obj.get("error"), dict): -error = obj["error"] -print(f'{error["class"]}: {error["desc"]}') -else: -print(json.dumps(obj)) - -self.socket.close() -self.socket = None +self.close() +self.connected = False def _connect(self): """Connect to a QMP TCP/IP port, if not connected yet""" -if self.socket: +if self.connected: return True -self.socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) -try: -self.socket.connect((self.host, self.port)) -except ConnectionRefusedError: -sys.exit(f"Can't connect to QMP host {self.host}:{self.port}") - -data = self.socket.recv(1024) -try: -obj = json.loads(data.decode("utf-8")) -
Re: [PATCH v5 7/7] scripts/ghes_inject: add a script to generate GHES error inject
Em Fri, 9 Aug 2024 08:26:09 +0200 Mauro Carvalho Chehab escreveu: > Em Fri, 9 Aug 2024 00:41:37 +0200 > Mauro Carvalho Chehab escreveu: > > > > You should be able to use e.g. > > > > > > legacy.py's QEMUMonitorProtocol class for synchronous connections, e.g. > > > > > > from qemu.qmp.legacy import QEMUMonitorProtocol > > > > > > qmp = QEMUMonitorProtocol((host, port)) > > > qmp.connect(negotiate=True) > > > > That sounds interesting! I give it a try. > > I applied the enclosed patch at the end of my patch series, but > somehow it is not working. For whatever reason, connect() is > raising a StateError apparently due to Runstate.CONNECTING. > > I tried both as declaring (see enclosed patch): > > class qmp(QEMUMonitorProtocol) > > and using: > > -super().__init__(self.host, self.port) > +self.qmp_monitor = QEMUMonitorProtocol(self.host, self.port) > > On both cases, it keeps waiting forever for a connection. Nevermind, placing host/post on a tuple made it work. The enclosed patch converts the script to use QEMUMonitorProtocol. I'll fold it with the script for the next spin of this series. Regards, Mauro --- [PATCH] scripts/qmp_helper.py: use QEMUMonitorProtocol class Instead of reinventing the wheel, let's use QEMUMonitorProtocol. Signed-off-by: Mauro Carvalho Chehab diff --git a/scripts/arm_processor_error.py b/scripts/arm_processor_error.py index 756935a2263c..f869f07860b8 100644 --- a/scripts/arm_processor_error.py +++ b/scripts/arm_processor_error.py @@ -169,14 +169,11 @@ def send_cper(self, args): if args.mpidr: cper["mpidr-el1"] = arg["mpidr"] elif cpus: -get_mpidr = { -"execute": "qom-get", -"arguments": { -'path': cpus[0], -'property': "x-mpidr" -} +cmd_arg = { +'path': cpus[0], +'property': "x-mpidr" } -ret = qmp_cmd.send_cmd(get_mpidr, may_open=True) +ret = qmp_cmd.send_cmd("qom-get", cmd_arg, may_open=True) if isinstance(ret, int): cper["mpidr-el1"] = ret else: @@ -291,8 +288,7 @@ def send_cper(self, args): context_info_num = 0 if ctx: -ret = qmp_cmd.send_cmd('{ "execute": "query-target" }', - may_open=True) +ret = qmp_cmd.send_cmd("query-target", may_open=True) default_ctx = self.CONTEXT_MISC_REG @@ -363,14 +359,11 @@ def send_cper(self, args): if "midr-el1" not in arg: if cpus: -get_mpidr = { -"execute": "qom-get", -"arguments": { -'path': cpus[0], -'property': "midr" -} +cmd_arg = { +'path': cpus[0], +'property': "midr" } -ret = qmp_cmd.send_cmd(get_mpidr, may_open=True) +ret = qmp_cmd.send_cmd("qom-get", cmd_arg, may_open=True) if isinstance(ret, int): arg["midr-el1"] = ret diff --git a/scripts/qmp_helper.py b/scripts/qmp_helper.py index 7214c15c6718..e2e0a881f6c1 100644 --- a/scripts/qmp_helper.py +++ b/scripts/qmp_helper.py @@ -9,6 +9,19 @@ import socket import sys +from os import path + +try: +qemu_dir = path.abspath(path.dirname(path.dirname(__file__))) +sys.path.append(path.join(qemu_dir, 'python')) + +from qemu.qmp.legacy import QEMUMonitorProtocol + +except ModuleNotFoundError as exc: +print(f"Module '{exc.name}' not found.") +print("Try export PYTHONPATH=top-qemu-dir/python or run from top-qemu-dir") +sys.exit(1) + from base64 import b64encode class qmp: @@ -16,26 +29,23 @@ class qmp: Opens a connection and send/receive QMP commands. """ -def send_cmd(self, command, may_open=False, return_error=True): +def send_cmd(self, command, args=None, may_open=False, return_error=True): """Send a command to QMP, optinally opening a connection""" if may_open: self._connect() -elif not self.socket: -return None +elif not self.connected: +return False -if isinstance(command, dict): -data = json.dumps(command).encode("utf-8") -else: -
Re: [PATCH v5 7/7] scripts/ghes_inject: add a script to generate GHES error inject
Em Thu, 8 Aug 2024 19:33:32 -0400 John Snow escreveu: > > > Then here you'd use qmp.cmd (raises exception on QMPError) or qmp.cmd_raw > > > or qmp.cmd_obj (returns the QMP response as the return value even if it > > was > > > an error.) > > > > Good to know, I'll try and see what fits best. > > > > I might *suggest* you try to use the exception-raising interface and catch > exceptions to interrogate expected errors as it aligns better with the > "idiomatic python API" - I have no plans to support an external API that > *returns* error objects except via the exception class. This approach will > be easier to port when I drop the legacy interface in the future, see below. > > But, that said, whichever is easiest. We use all three interfaces in many > places in the QEMU tree. I have no grounds to require you to use a specific > one ;) While a python-style exception handling is cool, I ended opting to use cmd_obj(), as the script needs to catch the end of /machine/unattached/device[] array, and using cmd_obj() made the conversion easier. One of the things I missed at the documentation is a description of the possible exceptions that cmd() could raise. It is probably worth documenting it and placing them on a QMP-specific error class, but a change like that would probably be incompatible with the existing applications. Probably something to be considered on your TODO list to move this from legacy ;-) Anyway, I already folded the changes at the branch I'll be using as basis for the next submission (be careful to use it, as I'm always rebasing it): https://gitlab.com/mchehab_kernel/qemu/-/commit/62feb8f6037ab762a9848eb601a041fbbbe2a77a#b665bcbc1e5ae3a488f1c0f20f8c29ae640bfa63_0_17 Thanks, Mauro
Re: [PATCH v5 5/7] qapi/ghes-cper: add an interface to do generic CPER error injection
Em Thu, 08 Aug 2024 16:45:51 +0200 Markus Armbruster escreveu: > Igor Mammedov writes: > > > On Thu, 8 Aug 2024 16:11:41 +0200 > > Mauro Carvalho Chehab wrote: > > > >> Em Thu, 08 Aug 2024 10:50:33 +0200 > >> Markus Armbruster escreveu: > >> > >> > Mauro Carvalho Chehab writes: > >> > >> > > diff --git a/MAINTAINERS b/MAINTAINERS > >> > > index 98eddf7ae155..655edcb6688c 100644 > >> > > --- a/MAINTAINERS > >> > > +++ b/MAINTAINERS > >> > > @@ -2075,6 +2075,13 @@ F: hw/acpi/ghes.c > >> > > F: include/hw/acpi/ghes.h > >> > > F: docs/specs/acpi_hest_ghes.rst > >> > > > >> > > +ACPI/HEST/GHES/ARM processor CPER > >> > > +R: Mauro Carvalho Chehab > >> > > +S: Maintained > >> > > +F: hw/arm/ghes_cper.c > >> > > +F: hw/acpi/ghes_cper_stub.c > >> > > +F: qapi/ghes-cper.json > >> > > + > >> > > >> > Here's the reason for creating a new QAPI module instead of adding to > >> > existing module acpi.json: different maintainers. > >> > > >> > Hypothetical question: if we didn't care for that, would this go into > >> > qapi/acpi.json? > >> > >> Independently of maintainers, GHES is part of ACPI APEI HEST, meaning > >> to report hardware errors. Such hardware errors are typically handled by > >> the host OS, so quest doesn't need to be aware of that[1]. > >> > >> So, IMO the best would be to keep APEI/HEST/GHES in a separate file. > >> > >> [1] still, I can foresee some scenarios were passing some errors to the > >> guest could make sense. > >> > >> > > >> > If yes, then should we call it acpi-ghes-cper.json or acpi-ghes.json > >> > instead? > >> > >> Naming it as acpi-ghes,acpi-hest or acpi-ghes-cper would equally work > >> from my side. > > > > if we going to keep it generic, acpi-hest would do > > Works for me. Ok, I'll do the rename. With regards to the files implementing support for it: hw/acpi/ghes_cper.c hw/acpi/ghes_cper_stub.c I guess there's no need to rename them, right? IMO such names are better than acpi/hest.c, specially since the actual implementation for HEST is inside acpi/ghes.c. > > >> > > ppc4xx > >> > > L: qemu-...@nongnu.org > >> > > S: Orphan > >> > > >> > [...] > >> > > >> > > diff --git a/qapi/ghes-cper.json b/qapi/ghes-cper.json > >> > > new file mode 100644 > >> > > index ..3cc4f9f2aaa9 > >> > > --- /dev/null > >> > > +++ b/qapi/ghes-cper.json > >> > > @@ -0,0 +1,55 @@ > >> > > +# -*- Mode: Python -*- > >> > > +# vim: filetype=python > >> > > + > >> > > +## > >> > > +# = GHESv2 CPER Error Injection > >> > > +# > >> > > +# These are defined at > >> > > +# ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2 > >> > > +# (GHESv2 - Type 10) > >> > > +## > >> > > >> > Feels a bit terse. These what? > >> > > >> > The reference could be clearer: "defined in the ACPI Specification 6.2, > >> > section 18.3.2.8 Generic Hardware Error Source version 2". A link would > >> > be nice, if it's stable. > >> > >> I can add a link, but only newer ACPI versions are hosted in html format > >> (e. g. only versions 6.4 and 6.5 are available as html at uefi.org). > > > > some years earlier it could be said 'stable link' about acpi spec hosted > > elsewhere. Not the case anymore after umbrella change. > > > > spec name, rev, chapter worked fine for acpi code (it's easy to find > > wherever spec is hosted). > > Probably the same would work for QAPI, I'm not QAPI maintainer though, > > so preffered approach here is absolutely up to you. > > A link is strictly optional. Stable links are nice, stale links are > annoying. Mauro, you decide :) Well, I guess I'll add a link then, keeping it in text mode as well. Changing umbrella is something that doesn't happen too often. Hopefully those will stay for a long time, if not forever, under uefi.org. If not, we can always drop the link. Thanks, Mauro
Re: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI
Em Mon, 12 Aug 2024 11:39:00 +0200 Igor Mammedov escreveu: > > We may also store cper_offset there via bios_linker_loader_add_pointer() > > and/or use bios_linker_loader_write_pointer(), but I can't see how the > > data stored there can be retrieved, nor any advantage of using it instead > > of the current code, as, in the end, we'll have 3 addresses that will be > > used: > > > > - an address where a pointer to CPER record will be stored; > > - an address where the ack will be stored; > > - an address where the actual CPER record will be stored. > > > > And those are calculated on a single function and are all stored at the > > ACPI table files. > > > > What am I missing? > > That's basically (2) approach and it works to some degree, > unfortunately it's fragile when we start talking about migration > and changing layout in the future. > > Lets take as example increasing size of 1) 'Generic Error Status Block', > we are considering. Old QEMU will, tell firmware to allocate 1K buffer > for it and calculated offsets to [1] (that you've stored/calculated) will > include this assumption. > Then in newer we QEMU increase size of [1] and all hardcoded offsets will > account for new size, but if we migrate guest from old QEMU to this newer > one all HEST tables layout within guest will match old QEMU assumptions, > and as result newer QEMU with larger block size will write CPERs at wrong > address considering we are still running guest from old QEMU. > That's just one example. > > To make it work there a number of ways, but the ultimate goal is to pick > one that's the least fragile and won't snowball in maintenance nightmare > as number of GHES sources increases over time. > > This series tries to solve problem of mapping GHES source to > a corresponding 'Generic Error Status Block' and related registers. > However we are missing access to this mapping since it only > exists in guest patched HEST (i.e in guest RAM only). > > The robust way to make it work would be for QEMU to get a pointer > to whole HEST table and then enumerate GHES sources and related > error/ack registers directly from guest RAM (sidestepping layout > change issues this way). > > what I'm proposing is to use bios_linker_loader_write_pointer() > (only once) so that firmware could tell QEMU address of HEST table, > in which one can find a GHES source and always correct error/ack > pointers (regardless of table[s] layout changes). Ok, got it. Such change was not easy, but I finally figured out how to make it actually work. I'll address tomorrow your comment on patch 5/10 about using raw data also for the other parts of CPER (generic error status and generic error data). If you want to do a sneak peak, I'm keeping the latest development version here: https://gitlab.com/mchehab_kernel/qemu/-/commits/qemu_submission?ref_type=heads In particular, the patch changing from /etc/hardware_errors offset to a HEST offset is at: https://gitlab.com/mchehab_kernel/qemu/-/commit/9197d22de09df97ce3d6725cb21bd2114c2eb43c It contains several cleanups to make the logic clearer and more robust. Thanks, Mauro
Re: [PATCH v6 04/10] qapi/ghes-cper: add an interface to do generic CPER error injection
Em Mon, 12 Aug 2024 13:57:44 +0200 Igor Mammedov escreveu: > n Platform Error Record - CPER - as defined at the UEFI > > +# specification. See > > +# > > https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#record-header > > +# for more details. > > +# > > +# @notification-type: pre-assigned GUID string indicating the record > > +# association with an error event notification type, as defined > > +# at > > https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#record-header > > +# > > +# @raw-data: Contains a base64 encoded string with the payload of > > +# the CPER. > > +# > > +# Since: 9.2 > > +## > > +{ 'struct': 'CommonPlatformErrorRecord', > > + 'data': { > > + 'notification-type': 'str', > > like was mentioned at v5 review, > you only need this for setting cper notification type if you are (re)using > > acpi_ghes_generic_error_status() && acpi_ghes_generic_error_data() > > however while doing this in (6/10), you are also limiting what > could be encoded in headers to some hardcoded values. > > Given QEMU doesn't need to know anything about notification type, > modulo putting it data block header, it would be beneficial > to drop 'notification type' from QAPI interface, and include > error status block and error data headers in raw-data. > > This way it should be possible to change headers within python script > without affecting QEMU and QAPI interface. On top of that > ghes_record_cper_errors() could be simplified by dropping (in 6/10) >acpi_ghes_generic_error_status() && acpi_ghes_generic_error_data() > and just copying raw-data as is directly into error buffer (assuming > script put needed headers cper data). > > From fusing pov it's also beneficial to try generate junk error status > block headers, for which python script looks like ideal place to put > it in. Got it. Will change it to just: { 'command': 'ghes-cper', 'data': { 'cper': 'str' }, 'features': [ 'unstable' ] } where cper contains an base64-encoded string with the entire raw data including generic error status end generic error data. I'm moving the current defaults to the python script. Let's merge this with the defaults there. The script can later be modified to allow changing such defaults. Thanks, Mauro
[PATCH v7 09/10] target/arm: add an experimental mpidr arm cpu property object
Accurately injecting an ARM Processor error ACPI/APEI GHES error record requires the value of the ARM Multiprocessor Affinity Register (mpidr). While ARM implements it, this is currently not visible. Add a field at CPU storing it, and place it at arm_cpu_properties as experimental, thus allowing it to be queried via QMP using qom-get function. Signed-off-by: Mauro Carvalho Chehab --- target/arm/cpu.c| 1 + target/arm/cpu.h| 1 + target/arm/helper.c | 10 -- 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/target/arm/cpu.c b/target/arm/cpu.c index 19191c239181..30fcf0a10f46 100644 --- a/target/arm/cpu.c +++ b/target/arm/cpu.c @@ -2619,6 +2619,7 @@ static ObjectClass *arm_cpu_class_by_name(const char *cpu_model) static Property arm_cpu_properties[] = { DEFINE_PROP_UINT64("midr", ARMCPU, midr, 0), +DEFINE_PROP_UINT64("x-mpidr", ARMCPU, mpidr, 0), DEFINE_PROP_UINT64("mp-affinity", ARMCPU, mp_affinity, ARM64_AFFINITY_INVALID), DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID), diff --git a/target/arm/cpu.h b/target/arm/cpu.h index a12859fc5335..d2e86f0877cc 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -1033,6 +1033,7 @@ struct ArchCPU { uint64_t reset_pmcr_el0; } isar; uint64_t midr; +uint64_t mpidr; uint32_t revidr; uint32_t reset_fpsid; uint64_t ctr; diff --git a/target/arm/helper.c b/target/arm/helper.c index 8fb4b474e83f..16e75b7c5ed9 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -4692,7 +4692,7 @@ static uint64_t mpidr_read_val(CPUARMState *env) return mpidr; } -static uint64_t mpidr_read(CPUARMState *env, const ARMCPRegInfo *ri) +static uint64_t mpidr_read(CPUARMState *env) { unsigned int cur_el = arm_current_el(env); @@ -4702,6 +4702,11 @@ static uint64_t mpidr_read(CPUARMState *env, const ARMCPRegInfo *ri) return mpidr_read_val(env); } +static uint64_t mpidr_read_ri(CPUARMState *env, const ARMCPRegInfo *ri) +{ +return mpidr_read(env); +} + static const ARMCPRegInfo lpae_cp_reginfo[] = { /* NOP AMAIR0/1 */ { .name = "AMAIR0", .state = ARM_CP_STATE_BOTH, @@ -9723,7 +9728,7 @@ void register_cp_regs_for_features(ARMCPU *cpu) { .name = "MPIDR_EL1", .state = ARM_CP_STATE_BOTH, .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 5, .fgt = FGT_MPIDR_EL1, - .access = PL1_R, .readfn = mpidr_read, .type = ARM_CP_NO_RAW }, + .access = PL1_R, .readfn = mpidr_read_ri, .type = ARM_CP_NO_RAW }, }; #ifdef CONFIG_USER_ONLY static const ARMCPRegUserSpaceInfo mpidr_user_cp_reginfo[] = { @@ -9733,6 +9738,7 @@ void register_cp_regs_for_features(ARMCPU *cpu) modify_arm_cp_regs(mpidr_cp_reginfo, mpidr_user_cp_reginfo); #endif define_arm_cp_regs(cpu, mpidr_cp_reginfo); +cpu->mpidr = mpidr_read(env); } if (arm_feature(env, ARM_FEATURE_AUXCR)) { -- 2.46.0
[PATCH v7 10/10] scripts/arm_processor_error.py: retrieve mpidr if not filled
Add support to retrieve mpidr value via qom-get. Signed-off-by: Mauro Carvalho Chehab --- scripts/arm_processor_error.py | 27 +++ 1 file changed, 19 insertions(+), 8 deletions(-) diff --git a/scripts/arm_processor_error.py b/scripts/arm_processor_error.py index 2643e4ddc5f3..f869f07860b8 100644 --- a/scripts/arm_processor_error.py +++ b/scripts/arm_processor_error.py @@ -5,12 +5,10 @@ # # Copyright (C) 2024 Mauro Carvalho Chehab -# TODO: current implementation has dummy defaults. -# -# For a better implementation, a QMP addition/call is needed to -# retrieve some data for ARM Processor Error injection: -# -# - ARM registers: power_state, mpidr. +# Note: currently it lacks a method to fill the ARM Processor Error CPER +# psci field from emulation. On a real hardware, this is filled only +# when a CPU is not running. Implementing support for it to simulate a +# real hardware is not trivial. import argparse import re @@ -168,11 +166,24 @@ def send_cper(self, args): else: cper["running-state"] = 0 +if args.mpidr: +cper["mpidr-el1"] = arg["mpidr"] +elif cpus: +cmd_arg = { +'path': cpus[0], +'property': "x-mpidr" +} +ret = qmp_cmd.send_cmd("qom-get", cmd_arg, may_open=True) +if isinstance(ret, int): +cper["mpidr-el1"] = ret +else: +cper["mpidr-el1"] = 0 + if arm_valid_init: if args.affinity: cper["valid"] |= self.arm_valid_bits["affinity"] -if args.mpidr: +if "mpidr-el1" in cper: cper["valid"] |= self.arm_valid_bits["mpidr"] if "running-state" in cper: @@ -356,7 +367,7 @@ def send_cper(self, args): if isinstance(ret, int): arg["midr-el1"] = ret -util.data_add(data, arg.get("mpidr-el1", 0), 8) +util.data_add(data, cper["mpidr-el1"], 8) util.data_add(data, arg.get("midr-el1", 0), 8) util.data_add(data, cper["running-state"], 4) util.data_add(data, arg.get("psci-state", 0), 4) -- 2.46.0
[PATCH v7 07/10] docs: acpi_hest_ghes: fix documentation for CPER size
While the spec defines a CPER size of 4KiB for each record, currently it is set to 1KiB. Fix the documentation and add a pointer to the macro name there, as this may help to keep it updated. Signed-off-by: Mauro Carvalho Chehab Acked-by: Igor Mammedov --- docs/specs/acpi_hest_ghes.rst | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/specs/acpi_hest_ghes.rst b/docs/specs/acpi_hest_ghes.rst index 68f1fbe0a4af..c3e9f8d9a702 100644 --- a/docs/specs/acpi_hest_ghes.rst +++ b/docs/specs/acpi_hest_ghes.rst @@ -67,8 +67,10 @@ Design Details (3) The address registers table contains N Error Block Address entries and N Read Ack Register entries. The size for each entry is 8-byte. The Error Status Data Block table contains N Error Status Data Block -entries. The size for each entry is 4096(0x1000) bytes. The total size -for the "etc/hardware_errors" fw_cfg blob is (N * 8 * 2 + N * 4096) bytes. +entries. The size for each entry is defined at the source code as +ACPI_GHES_MAX_RAW_DATA_LENGTH (currently 1024 bytes). The total size +for the "etc/hardware_errors" fw_cfg blob is +(N * 8 * 2 + N * ACPI_GHES_MAX_RAW_DATA_LENGTH) bytes. N is the number of the kinds of hardware error sources. (4) QEMU generates the ACPI linker/loader script for the firmware. The -- 2.46.0
[PATCH v7 05/10] acpi/ghes: rework the logic to handle HEST source ID
The current logic is based on a lot of duct tape, with offsets calculated based on one define with the number of source IDs and an enum. Rewrite the logic in a way that it would be more resilient of code changes, by moving the source ID count to an enum and make the offset calculus more explicit. Such change was inspired on a patch from Jonathan Cameron splitting the logic to get the CPER address on a separate function, as this will be needed to support generic error injection. Signed-off-by: Mauro Carvalho Chehab --- hw/acpi/ghes-stub.c | 3 +- hw/acpi/ghes.c | 204 +++ hw/arm/virt-acpi-build.c | 5 +- include/hw/acpi/ghes.h | 17 ++-- 4 files changed, 133 insertions(+), 96 deletions(-) diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c index c315de1802d6..8762449870b5 100644 --- a/hw/acpi/ghes-stub.c +++ b/hw/acpi/ghes-stub.c @@ -11,7 +11,8 @@ #include "qemu/osdep.h" #include "hw/acpi/ghes.h" -int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address) +int acpi_ghes_record_errors(enum AcpiGhesNotifyType notify, +uint64_t physical_address) { return -1; } diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index 280674452a60..f93499d7d647 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -28,14 +28,23 @@ #include "hw/nvram/fw_cfg.h" #include "qemu/uuid.h" -#define ACPI_GHES_ERRORS_FW_CFG_FILE"etc/hardware_errors" -#define ACPI_GHES_DATA_ADDR_FW_CFG_FILE "etc/hardware_errors_addr" +#define ACPI_HW_ERROR_FW_CFG_FILE "etc/hardware_errors" +#define ACPI_HW_ERROR_ADDR_FW_CFG_FILE "etc/hardware_errors_addr" +#define ACPI_HEST_ADDR_FW_CFG_FILE "etc/acpi_table_hest_addr" /* The max size in bytes for one error block */ #define ACPI_GHES_MAX_RAW_DATA_LENGTH (1 * KiB) -/* Support ARMv8 SEA notification type error source and GPIO interrupt. */ -#define ACPI_GHES_ERROR_SOURCE_COUNT2 +/* + * ID numbers used to fill HEST source ID field + */ +enum AcpiHestSourceId { +ACPI_HEST_SRC_ID_SEA, +ACPI_HEST_SRC_ID_GED, + +/* Shall be the last one */ +ACPI_HEST_SRC_ID_COUNT +} AcpiHestSourceId; /* Generic Hardware Error Source version 2 */ #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2 10 @@ -63,6 +72,15 @@ */ #define ACPI_GHES_GESB_SIZE 20 +/* + * Offsets with regards to the start of the HEST table stored at + * ags->hest_addr_le, according with the memory layout map at + * docs/specs/acpi_hest_ghes.rst. + */ +#define ACPI_HEST_TABLE_SIZE 40 +#define HEST_GHES_V2_TABLE_SIZE 92 +#define HEST_ACK_OFFSET (68 + ACPI_HEST_TABLE_SIZE) + /* * Values for error_severity field */ @@ -236,17 +254,17 @@ static int acpi_ghes_record_mem_error(uint64_t error_block_address, * Initialize "etc/hardware_errors" and "etc/hardware_errors_addr" fw_cfg blobs. * See docs/specs/acpi_hest_ghes.rst for blobs format. */ -void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker) +static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker) { int i, error_status_block_offset; /* Build error_block_address */ -for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) { +for (i = 0; i < ACPI_HEST_SRC_ID_COUNT; i++) { build_append_int_noprefix(hardware_errors, 0, sizeof(uint64_t)); } /* Build read_ack_register */ -for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) { +for (i = 0; i < ACPI_HEST_SRC_ID_COUNT; i++) { /* * Initialize the value of read_ack_register to 1, so GHES can be * writable after (re)boot. @@ -261,20 +279,20 @@ void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker) /* Reserve space for Error Status Data Block */ acpi_data_push(hardware_errors, -ACPI_GHES_MAX_RAW_DATA_LENGTH * ACPI_GHES_ERROR_SOURCE_COUNT); +ACPI_GHES_MAX_RAW_DATA_LENGTH * ACPI_HEST_SRC_ID_COUNT); /* Tell guest firmware to place hardware_errors blob into RAM */ -bios_linker_loader_alloc(linker, ACPI_GHES_ERRORS_FW_CFG_FILE, +bios_linker_loader_alloc(linker, ACPI_HW_ERROR_FW_CFG_FILE, hardware_errors, sizeof(uint64_t), false); -for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) { +for (i = 0; i < ACPI_HEST_SRC_ID_COUNT; i++) { /* * Tell firmware to patch error_block_address entries to point to * corresponding "Generic Error Status Block" */ bios_linker_loader_add_pointer(linker, -ACPI_GHES_ERRORS_FW_CFG_FILE, sizeof(uint64_t) * i, -sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE, +ACPI_HW_ERROR_FW_CFG_FILE, sizeof(uint64_t) * i, +sizeof(uint64_t), ACPI_HW_ERROR_FW_CFG_FILE, error_status_block
[PATCH v7 00/10] Add ACPI CPER firmware first error injection on ARM emulation
:0x20 [ 899.194273] {5}[Hardware Error]: access mode: secure [ 899.194544] {5}[Hardware Error]: Error info structure 3: [ 899.194838] {5}[Hardware Error]: num errors: 2 [ 899.195088] {5}[Hardware Error]:error_type: 0x10: micro-architectural error [ 899.195456] {5}[Hardware Error]:error_info: 0x78da03ff [ 899.195782] {5}[Hardware Error]: Error info structure 4: [ 899.196070] {5}[Hardware Error]: num errors: 2 [ 899.196331] {5}[Hardware Error]:error_type: 0x14: TLB error|micro-architectural error [ 899.196733] {5}[Hardware Error]: Context info structure 0: [ 899.197024] {5}[Hardware Error]:register context type: AArch64 EL1 context registers [ 899.197427] {5}[Hardware Error]:: [ 899.197741] {5}[Hardware Error]: Vendor specific error info has 5 bytes: [ 899.198096] {5}[Hardware Error]:: 13 7b 04 05 01 .{... [ 899.198610] [Firmware Warn]: GHES: Unhandled processor error type 0x02: cache error [ 899.199000] [Firmware Warn]: GHES: Unhandled processor error type 0x04: TLB error [ 899.199388] [Firmware Warn]: GHES: Unhandled processor error type 0x08: bus error [ 899.199767] [Firmware Warn]: GHES: Unhandled processor error type 0x10: micro-architectural error [ 899.200194] [Firmware Warn]: GHES: Unhandled processor error type 0x14: TLB error|micro-architectural error --- v7: - Change the way offsets are calculated and used on HEST table. Now, it is compatible with migrations as all offsets are relative to the HEST table; - GHES interface is now more generic: the entire CPER is sent via QMP, instead of just the payload; - Some code cleanups to make the code more robust; - The python script now uses QEMUMonitorProtocol class. v6: - PNP0C33 device creation moved to aml-build.c; - acpi_ghes record functions now use ACPI notify parameter, instead of source ID; - the number of source IDs is now automatically calculated; - some code cleanups and function/var renames; - some fixes and cleanups at the error injection script; - ghes cper stub now produces an error if cper JSON is not compiled; - Offset calculation logic for GHES was refactored; - Updated documentation to reflect the GHES allocated size; - Added a x-mpidr object for QOM usage; - Added a patch making usage of x-mpidr field at ARM injection script; v5: - CPER guid is now passing as string; - raw-data is now passed with base64 encode; - Removed several GPIO left-overs from arm/virt.c changes; - Lots of cleanups and improvements at the error injection script. It now better handles QMP dialog and doesn't print debug messages. Also, code was split on two modules, to make easier to add more error injection commands. v4: - CPER generation moved to happen outside QEMU; - One patch adding support for mpidr query was removed. v3: - patch 1 cleanups with some comment changes and adding another place where the poweroff GPIO define should be used. No changes on other patches (except due to conflict resolution). v2: - added a new patch using a define for GPIO power pin; - patch 2 changed to also use a define for generic error GPIO pin; - a couple cleanups at patch 2 removing uneeded else clauses. Jonathan Cameron (1): acpi/ghes: Add support for GED error device Mauro Carvalho Chehab (9): acpi/generic_event_device: add an APEI error device arm/virt: Wire up a GED error device for ACPI / GHES qapi/acpi-hest: add an interface to do generic CPER error injection acpi/ghes: rework the logic to handle HEST source ID acpi/ghes: add support for generic error injection via QAPI docs: acpi_hest_ghes: fix documentation for CPER size scripts/ghes_inject: add a script to generate GHES error inject target/arm: add an experimental mpidr arm cpu property object scripts/arm_processor_error.py: retrieve mpidr if not filled MAINTAINERS| 10 + docs/specs/acpi_hest_ghes.rst | 6 +- hw/acpi/Kconfig| 5 + hw/acpi/aml-build.c| 10 + hw/acpi/generic_event_device.c | 8 + hw/acpi/ghes-stub.c| 3 +- hw/acpi/ghes.c | 250 + hw/acpi/ghes_cper.c| 33 ++ hw/acpi/ghes_cper_stub.c | 19 + hw/acpi/meson.build| 2 + hw/arm/Kconfig | 5 + hw/arm/virt-acpi-build.c | 6 +- hw/arm/virt.c | 12 +- include/hw/acpi/acpi_dev_interface.h | 1 + include/hw/acpi/aml-build.h| 2 + include/hw/acpi/generic_event_device.h | 1 + include/hw/acpi/ghes.h | 23 +- include/hw/arm/virt.h | 1 + qapi/acpi-hest.json| 36 ++ qapi/meson.build | 1 + qapi/qapi-schema.json | 1 + scripts/arm_processor_error.py | 382 +++ sc
[PATCH v7 08/10] scripts/ghes_inject: add a script to generate GHES error inject
Using the QMP GHESv2 API requires preparing a raw data array containing a CPER record. Add a helper script with subcommands to prepare such data. Currently, only ARM Processor error CPER record is supported. Signed-off-by: Mauro Carvalho Chehab --- MAINTAINERS| 3 + scripts/arm_processor_error.py | 371 + scripts/ghes_inject.py | 51 scripts/qmp_helper.py | 486 + 4 files changed, 911 insertions(+) create mode 100644 scripts/arm_processor_error.py create mode 100755 scripts/ghes_inject.py create mode 100644 scripts/qmp_helper.py diff --git a/MAINTAINERS b/MAINTAINERS index 1d8091818899..249ed2858198 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2083,6 +2083,9 @@ S: Maintained F: hw/arm/ghes_cper.c F: hw/acpi/ghes_cper_stub.c F: qapi/acpi-hest.json +F: scripts/ghes_inject.py +F: scripts/arm_processor_error.py +F: scripts/qmp_helper.py ppc4xx L: qemu-...@nongnu.org diff --git a/scripts/arm_processor_error.py b/scripts/arm_processor_error.py new file mode 100644 index ..2643e4ddc5f3 --- /dev/null +++ b/scripts/arm_processor_error.py @@ -0,0 +1,371 @@ +#!/usr/bin/env python3 +# +# pylint: disable=C0301,C0114,R0903,R0912,R0913,R0914,R0915,W0511 +# SPDX-License-Identifier: GPL-2.0 +# +# Copyright (C) 2024 Mauro Carvalho Chehab + +# TODO: current implementation has dummy defaults. +# +# For a better implementation, a QMP addition/call is needed to +# retrieve some data for ARM Processor Error injection: +# +# - ARM registers: power_state, mpidr. + +import argparse +import re + +from qmp_helper import qmp, util, cper_guid + +class ArmProcessorEinj: +""" +Implements ARM Processor Error injection via GHES +""" + +DESC = """ +Generates an ARM processor error CPER, compatible with +UEFI 2.9A Errata. +""" + +ACPI_GHES_ARM_CPER_LENGTH = 40 +ACPI_GHES_ARM_CPER_PEI_LENGTH = 32 + +# Context types +CONTEXT_AARCH32_EL1 = 1 +CONTEXT_AARCH64_EL1 = 5 +CONTEXT_MISC_REG = 8 + +def __init__(self, subparsers): +"""Initialize the error injection class and add subparser""" + +# Valid choice values +self.arm_valid_bits = { +"mpidr":util.bit(0), +"affinity": util.bit(1), +"running": util.bit(2), +"vendor": util.bit(3), +} + +self.pei_flags = { +"first":util.bit(0), +"last": util.bit(1), +"propagated": util.bit(2), +"overflow": util.bit(3), +} + +self.pei_error_types = { +"cache":util.bit(1), +"tlb": util.bit(2), +"bus": util.bit(3), +"micro-arch": util.bit(4), +} + +self.pei_valid_bits = { +"multiple-error": util.bit(0), +"flags":util.bit(1), +"error-info": util.bit(2), +"virt-addr":util.bit(3), +"phy-addr": util.bit(4), +} + +self.data = bytearray() + +parser = subparsers.add_parser("arm", description=self.DESC) + +arm_valid_bits = ",".join(self.arm_valid_bits.keys()) +flags = ",".join(self.pei_flags.keys()) +error_types = ",".join(self.pei_error_types.keys()) +pei_valid_bits = ",".join(self.pei_valid_bits.keys()) + +# UEFI N.16 ARM Validation bits +g_arm = parser.add_argument_group("ARM processor") +g_arm.add_argument("--arm", "--arm-valid", + help=f"ARM valid bits: {arm_valid_bits}") +g_arm.add_argument("-a", "--affinity", "--level", "--affinity-level", + type=lambda x: int(x, 0), + help="Affinity level (when multiple levels apply)") +g_arm.add_argument("-l", "--mpidr", type=lambda x: int(x, 0), + help="Multiprocessor Affinity Register") +g_arm.add_argument("-i", "--midr", type=lambda x: int(x, 0), + help="Main ID Register") +g_arm.add_argument("-r", "--running", + action=argparse.BooleanOptionalAction, + default=None, + help="Indicates if the processor is running or not") +g_arm.add_argument("--psci", "--psci-state", + type=lambda x
[PATCH v7 03/10] acpi/ghes: Add support for GED error device
From: Jonathan Cameron As a GED error device is now defined, add another type of notification. Add error notification to GHES v2 using a GED error device GED triggered via interrupt. [mchehab: do some cleanups at ACPI_HEST_SRC_ID_* checks and rename HEST event to better identify GED interrupt OSPM] Signed-off-by: Jonathan Cameron Signed-off-by: Mauro Carvalho Chehab Reviewed-by: Igor Mammedov --- hw/acpi/ghes.c | 12 +--- include/hw/acpi/ghes.h | 3 ++- 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index 13b105c5d02d..280674452a60 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -34,8 +34,8 @@ /* The max size in bytes for one error block */ #define ACPI_GHES_MAX_RAW_DATA_LENGTH (1 * KiB) -/* Now only support ARMv8 SEA notification type error source */ -#define ACPI_GHES_ERROR_SOURCE_COUNT1 +/* Support ARMv8 SEA notification type error source and GPIO interrupt. */ +#define ACPI_GHES_ERROR_SOURCE_COUNT2 /* Generic Hardware Error Source version 2 */ #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2 10 @@ -290,6 +290,9 @@ void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker) static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker) { uint64_t address_offset; + +assert(source_id < ACPI_HEST_SRC_ID_RESERVED); + /* * Type: * Generic Hardware Error Source version 2(GHESv2 - Type 10) @@ -327,6 +330,9 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker) */ build_ghes_hw_error_notification(table_data, ACPI_GHES_NOTIFY_SEA); break; +case ACPI_HEST_SRC_ID_GED: +build_ghes_hw_error_notification(table_data, ACPI_GHES_NOTIFY_GPIO); +break; default: error_report("Not support this error source"); abort(); @@ -370,6 +376,7 @@ void acpi_build_hest(GArray *table_data, BIOSLinker *linker, /* Error Source Count */ build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4); build_ghes_v2(table_data, ACPI_HEST_SRC_ID_SEA, linker); +build_ghes_v2(table_data, ACPI_HEST_SRC_ID_GED, linker); acpi_table_end(linker, &table); } @@ -406,7 +413,6 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address) start_addr = le64_to_cpu(ags->ghes_addr_le); if (physical_address) { - if (source_id < ACPI_HEST_SRC_ID_RESERVED) { start_addr += source_id * sizeof(uint64_t); } diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h index fb80897e7eac..419a97d5cbd9 100644 --- a/include/hw/acpi/ghes.h +++ b/include/hw/acpi/ghes.h @@ -59,9 +59,10 @@ enum AcpiGhesNotifyType { ACPI_GHES_NOTIFY_RESERVED = 12 }; +/* Those are used as table indexes when building GHES tables */ enum { ACPI_HEST_SRC_ID_SEA = 0, -/* future ids go here */ +ACPI_HEST_SRC_ID_GED, ACPI_HEST_SRC_ID_RESERVED, }; -- 2.46.0
[PATCH v7 04/10] qapi/acpi-hest: add an interface to do generic CPER error injection
Creates a QMP command to be used for generic ACPI APEI hardware error injection (HEST) via GHESv2. The actual GHES code will be added at the followup patch. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Shiju Jose Reviewed-by: Jonathan Cameron --- MAINTAINERS | 7 +++ hw/acpi/Kconfig | 5 + hw/acpi/ghes_cper.c | 33 + hw/acpi/ghes_cper_stub.c | 19 +++ hw/acpi/meson.build | 2 ++ hw/arm/Kconfig | 5 + include/hw/acpi/ghes.h | 1 + qapi/acpi-hest.json | 36 qapi/meson.build | 1 + qapi/qapi-schema.json| 1 + 10 files changed, 110 insertions(+) create mode 100644 hw/acpi/ghes_cper.c create mode 100644 hw/acpi/ghes_cper_stub.c create mode 100644 qapi/acpi-hest.json diff --git a/MAINTAINERS b/MAINTAINERS index 3584d6a6c6da..1d8091818899 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2077,6 +2077,13 @@ F: hw/acpi/ghes.c F: include/hw/acpi/ghes.h F: docs/specs/acpi_hest_ghes.rst +ACPI/HEST/GHES/ARM processor CPER +R: Mauro Carvalho Chehab +S: Maintained +F: hw/arm/ghes_cper.c +F: hw/acpi/ghes_cper_stub.c +F: qapi/acpi-hest.json + ppc4xx L: qemu-...@nongnu.org S: Orphan diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig index e07d3204eb36..73ffbb82c150 100644 --- a/hw/acpi/Kconfig +++ b/hw/acpi/Kconfig @@ -51,6 +51,11 @@ config ACPI_APEI bool depends on ACPI +config GHES_CPER +bool +depends on ACPI_APEI +default y + config ACPI_PCI bool depends on ACPI && PCI diff --git a/hw/acpi/ghes_cper.c b/hw/acpi/ghes_cper.c new file mode 100644 index ..92ca84d738de --- /dev/null +++ b/hw/acpi/ghes_cper.c @@ -0,0 +1,33 @@ +/* + * CPER payload parser for error injection + * + * Copyright(C) 2024 Huawei LTD. + * + * This code is licensed under the GPL version 2 or later. See the + * COPYING file in the top-level directory. + * + */ + +#include "qemu/osdep.h" + +#include "qemu/base64.h" +#include "qemu/error-report.h" +#include "qemu/uuid.h" +#include "qapi/qapi-commands-acpi-hest.h" +#include "hw/acpi/ghes.h" + +void qmp_ghes_cper(const char *qmp_cper, + Error **errp) +{ + +uint8_t *cper; +size_t len; + +cper = qbase64_decode(qmp_cper, -1, &len, errp); +if (!cper) { +error_setg(errp, "missing GHES CPER payload"); +return; +} + +/* TODO: call a function at ghes */ +} diff --git a/hw/acpi/ghes_cper_stub.c b/hw/acpi/ghes_cper_stub.c new file mode 100644 index ..36138c462ac9 --- /dev/null +++ b/hw/acpi/ghes_cper_stub.c @@ -0,0 +1,19 @@ +/* + * Stub interface for CPER payload parser for error injection + * + * Copyright(C) 2024 Huawei LTD. + * + * This code is licensed under the GPL version 2 or later. See the + * COPYING file in the top-level directory. + * + */ + +#include "qemu/osdep.h" +#include "qapi/error.h" +#include "qapi/qapi-commands-acpi-hest.h" +#include "hw/acpi/ghes.h" + +void qmp_ghes_cper(const char *cper, Error **errp) +{ +error_setg(errp, "GHES QMP error inject is not compiled in"); +} diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build index fa5c07db9068..6cbf430eb66d 100644 --- a/hw/acpi/meson.build +++ b/hw/acpi/meson.build @@ -34,4 +34,6 @@ endif system_ss.add(when: 'CONFIG_ACPI', if_false: files('acpi-stub.c', 'aml-build-stub.c', 'ghes-stub.c', 'acpi_interface.c')) system_ss.add(when: 'CONFIG_ACPI_PCI_BRIDGE', if_false: files('pci-bridge-stub.c')) system_ss.add_all(when: 'CONFIG_ACPI', if_true: acpi_ss) +system_ss.add(when: 'CONFIG_GHES_CPER', if_true: files('ghes_cper.c')) +system_ss.add(when: 'CONFIG_GHES_CPER', if_false: files('ghes_cper_stub.c')) system_ss.add(files('acpi-qmp-cmds.c')) diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig index 1ad60da7aa2d..bed6ba27d715 100644 --- a/hw/arm/Kconfig +++ b/hw/arm/Kconfig @@ -712,3 +712,8 @@ config ARMSSE select UNIMP select SSE_COUNTER select SSE_TIMER + +config GHES_CPER +bool +depends on ARM +default y if AARCH64 diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h index 419a97d5cbd9..99d12d69c864 100644 --- a/include/hw/acpi/ghes.h +++ b/include/hw/acpi/ghes.h @@ -23,6 +23,7 @@ #define ACPI_GHES_H #include "hw/acpi/bios-linker-loader.h" +#include "qapi/error.h" #include "qemu/notify.h" extern NotifierList acpi_generic_error_notifiers; diff --git a/qapi/acpi-hest.json b/qapi/acpi-hest.json new file mode 100644 index ..91296755d285 --- /dev/null +++ b/qapi/acpi-hest.json @@ -0,0 +1,36 @@ +# -*- Mode: Python -*- +# vim: filetype=python + +## +# = GHESv2 CPER Error Injection +# +# Defined since ACPI Specification
[PATCH v7 02/10] arm/virt: Wire up a GED error device for ACPI / GHES
Adds support to ARM virtualization to allow handling generic error ACPI Event via GED & error source device. It is aligned with Linux Kernel patch: https://lore.kernel.org/lkml/1272350481-27951-8-git-send-email-ying.hu...@intel.com/ Co-authored-by: Mauro Carvalho Chehab Co-authored-by: Jonathan Cameron Signed-off-by: Jonathan Cameron Signed-off-by: Mauro Carvalho Chehab Acked-by: Igor Mammedov --- hw/acpi/ghes.c | 3 +++ hw/arm/virt-acpi-build.c | 1 + hw/arm/virt.c| 12 +++- include/hw/acpi/ghes.h | 3 +++ include/hw/arm/virt.h| 1 + 5 files changed, 19 insertions(+), 1 deletion(-) diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index e9511d9b8f71..13b105c5d02d 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -444,6 +444,9 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address) return ret; } +NotifierList acpi_generic_error_notifiers = +NOTIFIER_LIST_INITIALIZER(error_device_notifiers); + bool acpi_ghes_present(void) { AcpiGedState *acpi_ged_state; diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index f76fb117adff..1769467d23b2 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -858,6 +858,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) } acpi_dsdt_add_power_button(scope); +aml_append(scope, aml_error_device()); #ifdef CONFIG_TPM acpi_dsdt_add_tpm(scope, vms); #endif diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 687fe0bb8bc9..22448e5c5b73 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -677,7 +677,7 @@ static inline DeviceState *create_acpi_ged(VirtMachineState *vms) DeviceState *dev; MachineState *ms = MACHINE(vms); int irq = vms->irqmap[VIRT_ACPI_GED]; -uint32_t event = ACPI_GED_PWR_DOWN_EVT; +uint32_t event = ACPI_GED_PWR_DOWN_EVT | ACPI_GED_ERROR_EVT; if (ms->ram_slots) { event |= ACPI_GED_MEM_HOTPLUG_EVT; @@ -1009,6 +1009,13 @@ static void virt_powerdown_req(Notifier *n, void *opaque) } } +static void virt_generic_error_req(Notifier *n, void *opaque) +{ +VirtMachineState *s = container_of(n, VirtMachineState, generic_error_notifier); + +acpi_send_event(s->acpi_dev, ACPI_GENERIC_ERROR); +} + static void create_gpio_keys(char *fdt, DeviceState *pl061_dev, uint32_t phandle) { @@ -2385,6 +2392,9 @@ static void machvirt_init(MachineState *machine) if (has_ged && aarch64 && firmware_loaded && virt_is_acpi_enabled(vms)) { vms->acpi_dev = create_acpi_ged(vms); +vms->generic_error_notifier.notify = virt_generic_error_req; +notifier_list_add(&acpi_generic_error_notifiers, + &vms->generic_error_notifier); } else { create_gpio_devices(vms, VIRT_GPIO, sysmem); } diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h index 674f6958e905..fb80897e7eac 100644 --- a/include/hw/acpi/ghes.h +++ b/include/hw/acpi/ghes.h @@ -23,6 +23,9 @@ #define ACPI_GHES_H #include "hw/acpi/bios-linker-loader.h" +#include "qemu/notify.h" + +extern NotifierList acpi_generic_error_notifiers; /* * Values for Hardware Error Notification Type field diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h index a4d937ed45ac..ad9f6e94dcc5 100644 --- a/include/hw/arm/virt.h +++ b/include/hw/arm/virt.h @@ -175,6 +175,7 @@ struct VirtMachineState { DeviceState *gic; DeviceState *acpi_dev; Notifier powerdown_notifier; +Notifier generic_error_notifier; PCIBus *bus; char *oem_id; char *oem_table_id; -- 2.46.0
[PATCH v7 06/10] acpi/ghes: add support for generic error injection via QAPI
Provide a generic interface for error injection via GHESv2. This patch is co-authored: - original ghes logic to inject a simple ARM record by Shiju Jose; - generic logic to handle block addresses by Jonathan Cameron; - generic GHESv2 error inject by Mauro Carvalho Chehab; Co-authored-by: Jonathan Cameron Co-authored-by: Shiju Jose Co-authored-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Cameron Signed-off-by: Shiju Jose Signed-off-by: Mauro Carvalho Chehab --- hw/acpi/ghes.c | 57 ++ hw/acpi/ghes_cper.c| 2 +- include/hw/acpi/ghes.h | 3 +++ 3 files changed, 61 insertions(+), 1 deletion(-) diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index f93499d7d647..2f6b50d57ed2 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -495,6 +495,63 @@ int acpi_ghes_record_errors(enum AcpiGhesNotifyType notify, NotifierList acpi_generic_error_notifiers = NOTIFIER_LIST_INITIALIZER(error_device_notifiers); +void ghes_record_cper_errors(uint8_t *cper, size_t len, + enum AcpiGhesNotifyType notify, Error **errp) +{ +uint64_t cper_addr, read_ack_start_addr; +enum AcpiHestSourceId source; +AcpiGedState *acpi_ged_state; +AcpiGhesState *ags; +uint64_t read_ack; + +if (ghes_notify_to_source_id(notify, &source)) { +error_setg(errp, + "GHES: Invalid error block/ack address(es) for notify %d", + notify); +return; +} + +acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED, + NULL)); +g_assert(acpi_ged_state); +ags = &acpi_ged_state->ghes_state; + +cper_addr = le64_to_cpu(ags->ghes_addr_le); +cper_addr += 2 * ACPI_HEST_SRC_ID_COUNT * sizeof(uint64_t); +cper_addr += source * ACPI_GHES_MAX_RAW_DATA_LENGTH; + +read_ack_start_addr = le64_to_cpu(ags->hest_addr_le); +read_ack_start_addr += source * HEST_GHES_V2_TABLE_SIZE + HEST_ACK_OFFSET; + +cpu_physical_memory_read(read_ack_start_addr, + &read_ack, sizeof(uint64_t)); + +/* zero means OSPM does not acknowledge the error */ +if (!read_ack) { +error_setg(errp, + "Last CPER record was not acknowledged yet"); +read_ack = 1; +cpu_physical_memory_write(read_ack_start_addr, + &read_ack, sizeof(uint64_t)); +return; +} + +read_ack = cpu_to_le64(0); +cpu_physical_memory_write(read_ack_start_addr, + &read_ack, sizeof(uint64_t)); + +/* Build CPER record */ + +if (len > ACPI_GHES_MAX_RAW_DATA_LENGTH) { +error_setg(errp, "GHES CPER record is too big: %ld", len); +} + +/* Write the generic error data entry into guest memory */ +cpu_physical_memory_write(cper_addr, cper, len); + +notifier_list_notify(&acpi_generic_error_notifiers, NULL); +} + bool acpi_ghes_present(void) { AcpiGedState *acpi_ged_state; diff --git a/hw/acpi/ghes_cper.c b/hw/acpi/ghes_cper.c index 92ca84d738de..2328dbff7012 100644 --- a/hw/acpi/ghes_cper.c +++ b/hw/acpi/ghes_cper.c @@ -29,5 +29,5 @@ void qmp_ghes_cper(const char *qmp_cper, return; } -/* TODO: call a function at ghes */ +ghes_record_cper_errors(cper, len, ACPI_GHES_NOTIFY_GPIO, errp); } diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h index 2524b5e64624..dacd82c6857e 100644 --- a/include/hw/acpi/ghes.h +++ b/include/hw/acpi/ghes.h @@ -74,6 +74,9 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s, int acpi_ghes_record_errors(enum AcpiGhesNotifyType notify, uint64_t error_physical_addr); +void ghes_record_cper_errors(uint8_t *cper, size_t len, + enum AcpiGhesNotifyType notify,Error **errp); + /** * acpi_ghes_present: Report whether ACPI GHES table is present * -- 2.46.0
[PATCH v7 01/10] acpi/generic_event_device: add an APEI error device
Adds a generic error device to handle generic hardware error events as specified at ACPI 6.5 specification at 18.3.2.7.2: https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#event-notification-for-generic-error-sources using HID PNP0C33. The PNP0C33 device is used to report hardware errors to the guest via ACPI APEI Generic Hardware Error Source (GHES). Co-authored-by: Mauro Carvalho Chehab Co-authored-by: Jonathan Cameron Signed-off-by: Jonathan Cameron Signed-off-by: Mauro Carvalho Chehab Reviewed-by: Igor Mammedov --- hw/acpi/aml-build.c| 10 ++ hw/acpi/generic_event_device.c | 8 include/hw/acpi/acpi_dev_interface.h | 1 + include/hw/acpi/aml-build.h| 2 ++ include/hw/acpi/generic_event_device.h | 1 + 5 files changed, 22 insertions(+) diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c index 6d4517cfbe3d..cb167523859f 100644 --- a/hw/acpi/aml-build.c +++ b/hw/acpi/aml-build.c @@ -2520,3 +2520,13 @@ Aml *aml_i2c_serial_bus_device(uint16_t address, const char *resource_source) return var; } + +/* ACPI 5.0: 18.3.2.6.2 Event Notification For Generic Error Sources */ +Aml *aml_error_device(void) +{ +Aml *dev = aml_device(ACPI_APEI_ERROR_DEVICE); +aml_append(dev, aml_name_decl("_HID", aml_string("PNP0C33"))); +aml_append(dev, aml_name_decl("_UID", aml_int(0))); + +return dev; +} diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c index 15b4c3ebbf24..1673e9695be3 100644 --- a/hw/acpi/generic_event_device.c +++ b/hw/acpi/generic_event_device.c @@ -26,6 +26,7 @@ static const uint32_t ged_supported_events[] = { ACPI_GED_PWR_DOWN_EVT, ACPI_GED_NVDIMM_HOTPLUG_EVT, ACPI_GED_CPU_HOTPLUG_EVT, +ACPI_GED_ERROR_EVT }; /* @@ -116,6 +117,11 @@ void build_ged_aml(Aml *table, const char *name, HotplugHandler *hotplug_dev, aml_notify(aml_name(ACPI_POWER_BUTTON_DEVICE), aml_int(0x80))); break; +case ACPI_GED_ERROR_EVT: +aml_append(if_ctx, + aml_notify(aml_name(ACPI_APEI_ERROR_DEVICE), + aml_int(0x80))); +break; case ACPI_GED_NVDIMM_HOTPLUG_EVT: aml_append(if_ctx, aml_notify(aml_name("\\_SB.NVDR"), @@ -295,6 +301,8 @@ static void acpi_ged_send_event(AcpiDeviceIf *adev, AcpiEventStatusBits ev) sel = ACPI_GED_MEM_HOTPLUG_EVT; } else if (ev & ACPI_POWER_DOWN_STATUS) { sel = ACPI_GED_PWR_DOWN_EVT; +} else if (ev & ACPI_GENERIC_ERROR) { +sel = ACPI_GED_ERROR_EVT; } else if (ev & ACPI_NVDIMM_HOTPLUG_STATUS) { sel = ACPI_GED_NVDIMM_HOTPLUG_EVT; } else if (ev & ACPI_CPU_HOTPLUG_STATUS) { diff --git a/include/hw/acpi/acpi_dev_interface.h b/include/hw/acpi/acpi_dev_interface.h index 68d9d15f50aa..8294f8f0ccca 100644 --- a/include/hw/acpi/acpi_dev_interface.h +++ b/include/hw/acpi/acpi_dev_interface.h @@ -13,6 +13,7 @@ typedef enum { ACPI_NVDIMM_HOTPLUG_STATUS = 16, ACPI_VMGENID_CHANGE_STATUS = 32, ACPI_POWER_DOWN_STATUS = 64, +ACPI_GENERIC_ERROR = 128, } AcpiEventStatusBits; #define TYPE_ACPI_DEVICE_IF "acpi-device-interface" diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h index a3784155cb33..44d1a6af0c69 100644 --- a/include/hw/acpi/aml-build.h +++ b/include/hw/acpi/aml-build.h @@ -252,6 +252,7 @@ struct CrsRangeSet { /* Consumer/Producer */ #define AML_SERIAL_BUS_FLAG_CONSUME_ONLY(1 << 1) +#define ACPI_APEI_ERROR_DEVICE "GEDD" /** * init_aml_allocator: * @@ -382,6 +383,7 @@ Aml *aml_dma(AmlDmaType typ, AmlDmaBusMaster bm, AmlTransferSize sz, uint8_t channel); Aml *aml_sleep(uint64_t msec); Aml *aml_i2c_serial_bus_device(uint16_t address, const char *resource_source); +Aml *aml_error_device(void); /* Block AML object primitives */ Aml *aml_scope(const char *name_format, ...) G_GNUC_PRINTF(1, 2); diff --git a/include/hw/acpi/generic_event_device.h b/include/hw/acpi/generic_event_device.h index 40af3550b56d..9ace8fe70328 100644 --- a/include/hw/acpi/generic_event_device.h +++ b/include/hw/acpi/generic_event_device.h @@ -98,6 +98,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(AcpiGedState, ACPI_GED) #define ACPI_GED_PWR_DOWN_EVT 0x2 #define ACPI_GED_NVDIMM_HOTPLUG_EVT 0x4 #define ACPI_GED_CPU_HOTPLUG_EVT0x8 +#define ACPI_GED_ERROR_EVT 0x10 typedef struct GEDState { MemoryRegion evt; -- 2.46.0
Re: [PATCH v6 00/10] Add ACPI CPER firmware first error injection on ARM emulation
Em Mon, 12 Aug 2024 14:18:35 +0200 Igor Mammedov escreveu: > On Thu, 8 Aug 2024 14:57:35 +0200 > Mauro Carvalho Chehab wrote: > > > Em Thu, 8 Aug 2024 14:26:26 +0200 > > Mauro Carvalho Chehab escreveu: > > > > > v6: > > > - PNP0C33 device creation moved to aml-build.c; > > > - acpi_ghes record functions now use ACPI notify parameter, > > > instead of source ID; > > > - the number of source IDs is now automatically calculated; > > > - some code cleanups and function/var renames; > > > - some fixes and cleanups at the error injection script; > > > - ghes cper stub now produces an error if cper JSON is not compiled; > > > - Offset calculation logic for GHES was refactored; > > > - Updated documentation to reflect the GHES allocated size; > > > - Added a x-mpidr object for QOM usage; > > > - Added a patch making usage of x-mpidr field at ARM injection > > > script; > > stopping review at 5/10 and expecting a version with > GHES source to error status block mapping fetched from > HEST in guest RAM, instead of pre-calculated offsets > in source code (as in this series) to avoid migration > issues and keeping compat plumbing manageable down the road. Done. Sent a version 7 addressing it, and the other received feedbacks. On this version, there are just two offsets used during error injection: 1) the ack offset: represented relative to HEST table; 2) the CPER data offset: relative to /etc/hardware_errors table. Thanks, Mauro
Re: [PATCH v7 04/10] qapi/acpi-hest: add an interface to do generic CPER error injection
Em Wed, 14 Aug 2024 14:53:22 +0100 Jonathan Cameron escreveu: > On Wed, 14 Aug 2024 01:23:26 +0200 > Mauro Carvalho Chehab wrote: > > > Creates a QMP command to be used for generic ACPI APEI hardware error > > injection (HEST) via GHESv2. > > > > The actual GHES code will be added at the followup patch. > > > > Signed-off-by: Mauro Carvalho Chehab > > Signed-off-by: Shiju Jose > > Reviewed-by: Jonathan Cameron > > A few trivial things from a quick glance at this > (to remind myself of how this fits together). > > > diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig > > index e07d3204eb36..73ffbb82c150 100644 > > --- a/hw/acpi/Kconfig > > +++ b/hw/acpi/Kconfig > > @@ -51,6 +51,11 @@ config ACPI_APEI > > bool > > depends on ACPI > > > > +config GHES_CPER > > +bool > > +depends on ACPI_APEI > > +default y > > + > > config ACPI_PCI > > bool > > depends on ACPI && PCI > > diff --git a/hw/acpi/ghes_cper.c b/hw/acpi/ghes_cper.c > > new file mode 100644 > > index ..92ca84d738de > > --- /dev/null > > +++ b/hw/acpi/ghes_cper.c > > @@ -0,0 +1,33 @@ > > > +#include "qapi/qapi-commands-acpi-hest.h" > > +#include "hw/acpi/ghes.h" > > + > > +void qmp_ghes_cper(const char *qmp_cper, > > + Error **errp) Heh, with all code changes, this is not a lot simpler than before ;-) I'll address it on a next spin. > That's a very short line wrap. > > > +{ > > + > > +uint8_t *cper; > > +size_t len; > > + > > +cper = qbase64_decode(qmp_cper, -1, &len, errp); > > +if (!cper) { > > +error_setg(errp, "missing GHES CPER payload"); > > +return; > > +} > > + > > +/* TODO: call a function at ghes */ > > +} > > > diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h > > index 419a97d5cbd9..99d12d69c864 100644 > > --- a/include/hw/acpi/ghes.h > > +++ b/include/hw/acpi/ghes.h > > @@ -23,6 +23,7 @@ > > #define ACPI_GHES_H > > > > #include "hw/acpi/bios-linker-loader.h" > > +#include "qapi/error.h" > Odd to have an include added with no other changes in file? > Wrong patch maybe? Or should it be included by a c file instead? Removing it would cause a compilation breakage. It might be moved to a c file, but patch 5/10 requires it at ghes.h, as it adds this to ghes.h: void ghes_record_cper_errors(uint8_t *cper, size_t len, enum AcpiGhesNotifyType notify,Error **errp); So, instead of poking around moving this to/from .c/.h, I opted to place it on its final place. > > #include "qemu/notify.h" > > > > extern NotifierList acpi_generic_error_notifiers; > > diff --git a/qapi/acpi-hest.json b/qapi/acpi-hest.json > > Thanks, Mauro
Re: [PATCH v7 05/10] acpi/ghes: rework the logic to handle HEST source ID
Em Wed, 14 Aug 2024 01:23:27 +0200 Mauro Carvalho Chehab escreveu: This hunk is wrong: > @@ -350,9 +380,10 @@ static void build_ghes_v2(GArray *table_data, int > source_id, BIOSLinker *linker) > build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0, > 4 /* QWord access */, 0); > bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE, > -address_offset + GAS_ADDR_OFFSET, > -sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE, > -(ACPI_GHES_ERROR_SOURCE_COUNT + source_id) * sizeof(uint64_t)); > + address_offset + GAS_ADDR_OFFSET, > + sizeof(uint64_t), > + ACPI_BUILD_TABLE_FILE, > + address_offset + GAS_ADDR_OFFSET); > > /* > * Read Ack Preserve field It should be, instead: /* * Read Ack Register * ACPI 6.1: 18.3.2.8 Generic Hardware Error Source * version 2 (GHESv2 - Type 10) */ address_offset = table_data->len; build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0, 4 /* QWord access */, 0); bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE, address_offset + GAS_ADDR_OFFSET, sizeof(uint64_t), ACPI_HW_ERROR_FW_CFG_FILE, (ACPI_HEST_SRC_ID_COUNT + source_id) * sizeof(uint64_t)); Funny enough, even with this problem, error injection was working. I'll be preparing a v8 with such fix applied. I'll also add an optional patch at the end to double-check if the links are properly generated, using an abort() in case something ever goes wrong. Regards, Mauro
Re: [PATCH v7 01/10] acpi/generic_event_device: add an APEI error device
Em Wed, 14 Aug 2024 13:33:21 +0100 Jonathan Cameron escreveu: > On Wed, 14 Aug 2024 01:23:23 +0200 > Mauro Carvalho Chehab wrote: > > > Adds a generic error device to handle generic hardware error > > events as specified at ACPI 6.5 specification at 18.3.2.7.2: > > https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#event-notification-for-generic-error-sources > > using HID PNP0C33. > > > > The PNP0C33 device is used to report hardware errors to > > the guest via ACPI APEI Generic Hardware Error Source (GHES). > > > > Co-authored-by: Mauro Carvalho Chehab > > Co-authored-by: Jonathan Cameron > > Signed-off-by: Jonathan Cameron > > Signed-off-by: Mauro Carvalho Chehab > > Reviewed-by: Igor Mammedov > > --- > > hw/acpi/aml-build.c| 10 ++ > > hw/acpi/generic_event_device.c | 8 > > include/hw/acpi/acpi_dev_interface.h | 1 + > > include/hw/acpi/aml-build.h| 2 ++ > > include/hw/acpi/generic_event_device.h | 1 + > > 5 files changed, 22 insertions(+) > > > > diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c > > index 6d4517cfbe3d..cb167523859f 100644 > > --- a/hw/acpi/aml-build.c > > +++ b/hw/acpi/aml-build.c > > @@ -2520,3 +2520,13 @@ Aml *aml_i2c_serial_bus_device(uint16_t address, > > const char *resource_source) > > > > return var; > > } > > + > > +/* ACPI 5.0: 18.3.2.6.2 Event Notification For Generic Error Sources */ > > Given this section got a rename maybe the comment should mention old > name and current name for the section? ACPI 6.5 has the same name for the section: 18.3.2.7.2. Event Notification For Generic Error Sources An event notification is recommended for corrected errors where latency in processing error reports is not critical to proper system operation. The implementation of Event notification requires the platform to define a device with PNP ID PNP0C33 in the ACPI namespace, referred to as the error device. Just section number changed. IMO, it is still good enough to seek for it at the docs. Btw, in this specific case, the best is to use the search box of Sphinx html output and seek for PNP0C33 ;-) > > > +Aml *aml_error_device(void) > > +{ > > +Aml *dev = aml_device(ACPI_APEI_ERROR_DEVICE); > > +aml_append(dev, aml_name_decl("_HID", aml_string("PNP0C33"))); > > +aml_append(dev, aml_name_decl("_UID", aml_int(0))); > > + > > +return dev; > > +} > > diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c > > index 15b4c3ebbf24..1673e9695be3 100644 > > --- a/hw/acpi/generic_event_device.c > > +++ b/hw/acpi/generic_event_device.c > > @@ -26,6 +26,7 @@ static const uint32_t ged_supported_events[] = { > > ACPI_GED_PWR_DOWN_EVT, > > ACPI_GED_NVDIMM_HOTPLUG_EVT, > > ACPI_GED_CPU_HOTPLUG_EVT, > > +ACPI_GED_ERROR_EVT > > trailing comma missing. I'll add. Thanks, Mauro
[PATCH v8 04/13] qapi/acpi-hest: add an interface to do generic CPER error injection
Creates a QMP command to be used for generic ACPI APEI hardware error injection (HEST) via GHESv2. The actual GHES code will be added at the followup patch. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Shiju Jose Reviewed-by: Jonathan Cameron --- MAINTAINERS | 7 +++ hw/acpi/Kconfig | 5 + hw/acpi/ghes_cper.c | 33 + hw/acpi/ghes_cper_stub.c | 19 +++ hw/acpi/meson.build | 2 ++ hw/arm/Kconfig | 5 + include/hw/acpi/ghes.h | 3 +++ qapi/acpi-hest.json | 36 qapi/meson.build | 1 + qapi/qapi-schema.json| 1 + 10 files changed, 112 insertions(+) create mode 100644 hw/acpi/ghes_cper.c create mode 100644 hw/acpi/ghes_cper_stub.c create mode 100644 qapi/acpi-hest.json diff --git a/MAINTAINERS b/MAINTAINERS index 3584d6a6c6da..1d8091818899 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2077,6 +2077,13 @@ F: hw/acpi/ghes.c F: include/hw/acpi/ghes.h F: docs/specs/acpi_hest_ghes.rst +ACPI/HEST/GHES/ARM processor CPER +R: Mauro Carvalho Chehab +S: Maintained +F: hw/arm/ghes_cper.c +F: hw/acpi/ghes_cper_stub.c +F: qapi/acpi-hest.json + ppc4xx L: qemu-...@nongnu.org S: Orphan diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig index e07d3204eb36..73ffbb82c150 100644 --- a/hw/acpi/Kconfig +++ b/hw/acpi/Kconfig @@ -51,6 +51,11 @@ config ACPI_APEI bool depends on ACPI +config GHES_CPER +bool +depends on ACPI_APEI +default y + config ACPI_PCI bool depends on ACPI && PCI diff --git a/hw/acpi/ghes_cper.c b/hw/acpi/ghes_cper.c new file mode 100644 index ..92ca84d738de --- /dev/null +++ b/hw/acpi/ghes_cper.c @@ -0,0 +1,33 @@ +/* + * CPER payload parser for error injection + * + * Copyright(C) 2024 Huawei LTD. + * + * This code is licensed under the GPL version 2 or later. See the + * COPYING file in the top-level directory. + * + */ + +#include "qemu/osdep.h" + +#include "qemu/base64.h" +#include "qemu/error-report.h" +#include "qemu/uuid.h" +#include "qapi/qapi-commands-acpi-hest.h" +#include "hw/acpi/ghes.h" + +void qmp_ghes_cper(const char *qmp_cper, + Error **errp) +{ + +uint8_t *cper; +size_t len; + +cper = qbase64_decode(qmp_cper, -1, &len, errp); +if (!cper) { +error_setg(errp, "missing GHES CPER payload"); +return; +} + +/* TODO: call a function at ghes */ +} diff --git a/hw/acpi/ghes_cper_stub.c b/hw/acpi/ghes_cper_stub.c new file mode 100644 index ..36138c462ac9 --- /dev/null +++ b/hw/acpi/ghes_cper_stub.c @@ -0,0 +1,19 @@ +/* + * Stub interface for CPER payload parser for error injection + * + * Copyright(C) 2024 Huawei LTD. + * + * This code is licensed under the GPL version 2 or later. See the + * COPYING file in the top-level directory. + * + */ + +#include "qemu/osdep.h" +#include "qapi/error.h" +#include "qapi/qapi-commands-acpi-hest.h" +#include "hw/acpi/ghes.h" + +void qmp_ghes_cper(const char *cper, Error **errp) +{ +error_setg(errp, "GHES QMP error inject is not compiled in"); +} diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build index fa5c07db9068..6cbf430eb66d 100644 --- a/hw/acpi/meson.build +++ b/hw/acpi/meson.build @@ -34,4 +34,6 @@ endif system_ss.add(when: 'CONFIG_ACPI', if_false: files('acpi-stub.c', 'aml-build-stub.c', 'ghes-stub.c', 'acpi_interface.c')) system_ss.add(when: 'CONFIG_ACPI_PCI_BRIDGE', if_false: files('pci-bridge-stub.c')) system_ss.add_all(when: 'CONFIG_ACPI', if_true: acpi_ss) +system_ss.add(when: 'CONFIG_GHES_CPER', if_true: files('ghes_cper.c')) +system_ss.add(when: 'CONFIG_GHES_CPER', if_false: files('ghes_cper_stub.c')) system_ss.add(files('acpi-qmp-cmds.c')) diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig index 1ad60da7aa2d..bed6ba27d715 100644 --- a/hw/arm/Kconfig +++ b/hw/arm/Kconfig @@ -712,3 +712,8 @@ config ARMSSE select UNIMP select SSE_COUNTER select SSE_TIMER + +config GHES_CPER +bool +depends on ARM +default y if AARCH64 diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h index 419a97d5cbd9..b977d65564ba 100644 --- a/include/hw/acpi/ghes.h +++ b/include/hw/acpi/ghes.h @@ -23,6 +23,7 @@ #define ACPI_GHES_H #include "hw/acpi/bios-linker-loader.h" +#include "qapi/error.h" #include "qemu/notify.h" extern NotifierList acpi_generic_error_notifiers; @@ -77,6 +78,8 @@ void acpi_build_hest(GArray *table_data, BIOSLinker *linker, void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s, GArray *hardware_errors); int acpi_ghes_record_errors(uint8_t notify, uint64_t error_physical_addr); +void ghes_
[PATCH v8 08/13] docs: acpi_hest_ghes: fix documentation for CPER size
While the spec defines a CPER size of 4KiB for each record, currently it is set to 1KiB. Fix the documentation and add a pointer to the macro name there, as this may help to keep it updated. Signed-off-by: Mauro Carvalho Chehab Acked-by: Igor Mammedov --- docs/specs/acpi_hest_ghes.rst | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/specs/acpi_hest_ghes.rst b/docs/specs/acpi_hest_ghes.rst index 68f1fbe0a4af..c3e9f8d9a702 100644 --- a/docs/specs/acpi_hest_ghes.rst +++ b/docs/specs/acpi_hest_ghes.rst @@ -67,8 +67,10 @@ Design Details (3) The address registers table contains N Error Block Address entries and N Read Ack Register entries. The size for each entry is 8-byte. The Error Status Data Block table contains N Error Status Data Block -entries. The size for each entry is 4096(0x1000) bytes. The total size -for the "etc/hardware_errors" fw_cfg blob is (N * 8 * 2 + N * 4096) bytes. +entries. The size for each entry is defined at the source code as +ACPI_GHES_MAX_RAW_DATA_LENGTH (currently 1024 bytes). The total size +for the "etc/hardware_errors" fw_cfg blob is +(N * 8 * 2 + N * ACPI_GHES_MAX_RAW_DATA_LENGTH) bytes. N is the number of the kinds of hardware error sources. (4) QEMU generates the ACPI linker/loader script for the firmware. The -- 2.46.0
[PATCH v8 11/13] scripts/arm_processor_error.py: retrieve mpidr if not filled
Add support to retrieve mpidr value via qom-get. Signed-off-by: Mauro Carvalho Chehab --- scripts/arm_processor_error.py | 27 +++ 1 file changed, 19 insertions(+), 8 deletions(-) diff --git a/scripts/arm_processor_error.py b/scripts/arm_processor_error.py index 62e0c5662232..0a16d4f0d8b1 100644 --- a/scripts/arm_processor_error.py +++ b/scripts/arm_processor_error.py @@ -5,12 +5,10 @@ # # Copyright (C) 2024 Mauro Carvalho Chehab -# TODO: current implementation has dummy defaults. -# -# For a better implementation, a QMP addition/call is needed to -# retrieve some data for ARM Processor Error injection: -# -# - ARM registers: power_state, mpidr. +# Note: currently it lacks a method to fill the ARM Processor Error CPER +# psci field from emulation. On a real hardware, this is filled only +# when a CPU is not running. Implementing support for it to simulate a +# real hardware is not trivial. import argparse import re @@ -174,11 +172,24 @@ def send_cper(self, args): else: cper["running-state"] = 0 +if args.mpidr: +cper["mpidr-el1"] = arg["mpidr"] +elif cpus: +cmd_arg = { +'path': cpus[0], +'property': "x-mpidr" +} +ret = qmp_cmd.send_cmd("qom-get", cmd_arg, may_open=True) +if isinstance(ret, int): +cper["mpidr-el1"] = ret +else: +cper["mpidr-el1"] = 0 + if arm_valid_init: if args.affinity: cper["valid"] |= self.arm_valid_bits["affinity"] -if args.mpidr: +if "mpidr-el1" in cper: cper["valid"] |= self.arm_valid_bits["mpidr"] if "running-state" in cper: @@ -362,7 +373,7 @@ def send_cper(self, args): if isinstance(ret, int): arg["midr-el1"] = ret -util.data_add(data, arg.get("mpidr-el1", 0), 8) +util.data_add(data, cper["mpidr-el1"], 8) util.data_add(data, arg.get("midr-el1", 0), 8) util.data_add(data, cper["running-state"], 4) util.data_add(data, arg.get("psci-state", 0), 4) -- 2.46.0
[PATCH v8 03/13] acpi/ghes: Add support for GED error device
From: Jonathan Cameron As a GED error device is now defined, add another type of notification. Add error notification to GHES v2 using a GED error device GED triggered via interrupt. [mchehab: do some cleanups at ACPI_HEST_SRC_ID_* checks and rename HEST event to better identify GED interrupt OSPM] Signed-off-by: Jonathan Cameron Signed-off-by: Mauro Carvalho Chehab Reviewed-by: Igor Mammedov --- hw/acpi/ghes.c | 11 +-- include/hw/acpi/ghes.h | 3 ++- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index 13b105c5d02d..df59fd35568c 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -34,8 +34,8 @@ /* The max size in bytes for one error block */ #define ACPI_GHES_MAX_RAW_DATA_LENGTH (1 * KiB) -/* Now only support ARMv8 SEA notification type error source */ -#define ACPI_GHES_ERROR_SOURCE_COUNT1 +/* Support ARMv8 SEA notification type error source and GPIO interrupt. */ +#define ACPI_GHES_ERROR_SOURCE_COUNT2 /* Generic Hardware Error Source version 2 */ #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2 10 @@ -290,6 +290,9 @@ void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker) static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker) { uint64_t address_offset; + +assert(source_id < ACPI_HEST_SRC_ID_RESERVED); + /* * Type: * Generic Hardware Error Source version 2(GHESv2 - Type 10) @@ -327,6 +330,9 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker) */ build_ghes_hw_error_notification(table_data, ACPI_GHES_NOTIFY_SEA); break; +case ACPI_HEST_SRC_ID_GED: +build_ghes_hw_error_notification(table_data, ACPI_GHES_NOTIFY_GPIO); +break; default: error_report("Not support this error source"); abort(); @@ -370,6 +376,7 @@ void acpi_build_hest(GArray *table_data, BIOSLinker *linker, /* Error Source Count */ build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4); build_ghes_v2(table_data, ACPI_HEST_SRC_ID_SEA, linker); +build_ghes_v2(table_data, ACPI_HEST_SRC_ID_GED, linker); acpi_table_end(linker, &table); } diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h index fb80897e7eac..419a97d5cbd9 100644 --- a/include/hw/acpi/ghes.h +++ b/include/hw/acpi/ghes.h @@ -59,9 +59,10 @@ enum AcpiGhesNotifyType { ACPI_GHES_NOTIFY_RESERVED = 12 }; +/* Those are used as table indexes when building GHES tables */ enum { ACPI_HEST_SRC_ID_SEA = 0, -/* future ids go here */ +ACPI_HEST_SRC_ID_GED, ACPI_HEST_SRC_ID_RESERVED, }; -- 2.46.0
[PATCH v8 05/13] acpi/ghes: rework the logic to handle HEST source ID
The current logic is based on a lot of duct tape, with offsets calculated based on one define with the number of source IDs and an enum. Rewrite the logic in a way that it would be more resilient of code changes, by moving the source ID count to an enum and make the offset calculus more explicit. Such change was inspired on a patch from Jonathan Cameron splitting the logic to get the CPER address on a separate function, as this will be needed to support generic error injection. Signed-off-by: Mauro Carvalho Chehab --- hw/acpi/ghes-stub.c | 3 +- hw/acpi/ghes.c | 210 --- hw/arm/virt-acpi-build.c | 5 +- include/hw/acpi/ghes.h | 17 ++-- 4 files changed, 138 insertions(+), 97 deletions(-) diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c index c315de1802d6..8762449870b5 100644 --- a/hw/acpi/ghes-stub.c +++ b/hw/acpi/ghes-stub.c @@ -11,7 +11,8 @@ #include "qemu/osdep.h" #include "hw/acpi/ghes.h" -int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address) +int acpi_ghes_record_errors(enum AcpiGhesNotifyType notify, +uint64_t physical_address) { return -1; } diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index df59fd35568c..7870f51e2a9e 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -28,14 +28,23 @@ #include "hw/nvram/fw_cfg.h" #include "qemu/uuid.h" -#define ACPI_GHES_ERRORS_FW_CFG_FILE"etc/hardware_errors" -#define ACPI_GHES_DATA_ADDR_FW_CFG_FILE "etc/hardware_errors_addr" +#define ACPI_HW_ERROR_FW_CFG_FILE "etc/hardware_errors" +#define ACPI_HW_ERROR_ADDR_FW_CFG_FILE "etc/hardware_errors_addr" +#define ACPI_HEST_ADDR_FW_CFG_FILE "etc/acpi_table_hest_addr" /* The max size in bytes for one error block */ #define ACPI_GHES_MAX_RAW_DATA_LENGTH (1 * KiB) -/* Support ARMv8 SEA notification type error source and GPIO interrupt. */ -#define ACPI_GHES_ERROR_SOURCE_COUNT2 +/* + * ID numbers used to fill HEST source ID field + */ +enum AcpiHestSourceId { +ACPI_HEST_SRC_ID_SEA, +ACPI_HEST_SRC_ID_GED, + +/* Shall be the last one */ +ACPI_HEST_SRC_ID_COUNT +} AcpiHestSourceId; /* Generic Hardware Error Source version 2 */ #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2 10 @@ -63,6 +72,19 @@ */ #define ACPI_GHES_GESB_SIZE 20 +/* + * Offsets with regards to the start of the HEST table stored at + * ags->hest_addr_le, according with the memory layout map at + * docs/specs/acpi_hest_ghes.rst. + */ + +/* ACPI 4.0: 17.3.2 ACPI Error Source */ +#define ACPI_HEST_HEADER_SIZE40 + +/* ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2 */ +#define HEST_GHES_V2_TABLE_SIZE 92 +#define GHES_ACK_OFFSET (64 + GAS_ADDR_OFFSET + ACPI_HEST_HEADER_SIZE) + /* * Values for error_severity field */ @@ -236,17 +258,17 @@ static int acpi_ghes_record_mem_error(uint64_t error_block_address, * Initialize "etc/hardware_errors" and "etc/hardware_errors_addr" fw_cfg blobs. * See docs/specs/acpi_hest_ghes.rst for blobs format. */ -void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker) +static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker) { int i, error_status_block_offset; /* Build error_block_address */ -for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) { +for (i = 0; i < ACPI_HEST_SRC_ID_COUNT; i++) { build_append_int_noprefix(hardware_errors, 0, sizeof(uint64_t)); } /* Build read_ack_register */ -for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) { +for (i = 0; i < ACPI_HEST_SRC_ID_COUNT; i++) { /* * Initialize the value of read_ack_register to 1, so GHES can be * writable after (re)boot. @@ -261,20 +283,20 @@ void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker) /* Reserve space for Error Status Data Block */ acpi_data_push(hardware_errors, -ACPI_GHES_MAX_RAW_DATA_LENGTH * ACPI_GHES_ERROR_SOURCE_COUNT); +ACPI_GHES_MAX_RAW_DATA_LENGTH * ACPI_HEST_SRC_ID_COUNT); /* Tell guest firmware to place hardware_errors blob into RAM */ -bios_linker_loader_alloc(linker, ACPI_GHES_ERRORS_FW_CFG_FILE, +bios_linker_loader_alloc(linker, ACPI_HW_ERROR_FW_CFG_FILE, hardware_errors, sizeof(uint64_t), false); -for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) { +for (i = 0; i < ACPI_HEST_SRC_ID_COUNT; i++) { /* * Tell firmware to patch error_block_address entries to point to * corresponding "Generic Error Status Block" */ bios_linker_loader_add_pointer(linker, -ACPI_GHES_ERRORS_FW_CFG_FILE, sizeof(uint64_t) * i, -sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE, +ACPI_HW_
[PATCH v8 00/13] Add ACPI CPER firmware first error injection on ARM emulation
ware Error]:error_info: 0x0091000f [9.364834] {1}[Hardware Error]: transaction type: Data Access [9.365599] {1}[Hardware Error]: cache error, operation type: Data write [9.366441] {1}[Hardware Error]: cache level: 2 [9.367005] {1}[Hardware Error]: processor context not corrupted [9.367753] {1}[Hardware Error]:physical fault address: 0xdeadbeef [9.374267] Memory failure: 0xdeadb: recovery action for free buddy page: Recovered Such script currently supports arm processor error CPER, but can easily be extended to other GHES notification types. Jonathan Cameron (1): acpi/ghes: Add support for GED error device Mauro Carvalho Chehab (12): acpi/generic_event_device: add an APEI error device arm/virt: Wire up a GED error device for ACPI / GHES qapi/acpi-hest: add an interface to do generic CPER error injection acpi/ghes: rework the logic to handle HEST source ID acpi/ghes: add support for generic error injection via QAPI acpi/ghes: cleanup the memory error code logic docs: acpi_hest_ghes: fix documentation for CPER size scripts/ghes_inject: add a script to generate GHES error inject target/arm: add an experimental mpidr arm cpu property object scripts/arm_processor_error.py: retrieve mpidr if not filled acpi/ghes: cleanup generic error data logic acpi/ghes: check if the BIOS pointers for HEST are correct MAINTAINERS| 10 + docs/specs/acpi_hest_ghes.rst | 6 +- hw/acpi/Kconfig| 5 + hw/acpi/aml-build.c| 10 + hw/acpi/generic_event_device.c | 8 + hw/acpi/ghes-stub.c| 3 +- hw/acpi/ghes.c | 362 - hw/acpi/ghes_cper.c| 33 ++ hw/acpi/ghes_cper_stub.c | 19 + hw/acpi/meson.build| 2 + hw/arm/Kconfig | 5 + hw/arm/virt-acpi-build.c | 6 +- hw/arm/virt.c | 12 +- include/hw/acpi/acpi_dev_interface.h | 1 + include/hw/acpi/aml-build.h| 2 + include/hw/acpi/generic_event_device.h | 1 + include/hw/acpi/ghes.h | 24 +- include/hw/arm/virt.h | 1 + qapi/acpi-hest.json| 36 ++ qapi/meson.build | 1 + qapi/qapi-schema.json | 1 + scripts/arm_processor_error.py | 388 ++ scripts/ghes_inject.py | 51 ++ scripts/qmp_helper.py | 702 + target/arm/cpu.c | 1 + target/arm/cpu.h | 1 + target/arm/helper.c| 10 +- target/arm/kvm.c | 2 +- 28 files changed, 1551 insertions(+), 152 deletions(-) create mode 100644 hw/acpi/ghes_cper.c create mode 100644 hw/acpi/ghes_cper_stub.c create mode 100644 qapi/acpi-hest.json create mode 100644 scripts/arm_processor_error.py create mode 100755 scripts/ghes_inject.py create mode 100644 scripts/qmp_helper.py -- 2.46.0
[PATCH v8 12/13] acpi/ghes: cleanup generic error data logic
Remove comments that are obvious. No functional changes. Signed-off-by: Mauro Carvalho Chehab --- hw/acpi/ghes.c | 38 +++--- 1 file changed, 15 insertions(+), 23 deletions(-) diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index 4f7b6c5ad2b6..a822a5eafaa0 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -130,34 +130,28 @@ static void build_ghes_hw_error_notification(GArray *table, const uint8_t type) * ACPI 6.1: 18.3.2.7.1 Generic Error Data */ static void acpi_ghes_generic_error_data(GArray *table, -const uint8_t *section_type, uint32_t error_severity, -uint8_t validation_bits, uint8_t flags, -uint32_t error_data_length, QemuUUID fru_id, -uint64_t time_stamp) + const uint8_t *section_type, + uint32_t error_severity, + uint8_t validation_bits, + uint8_t flags, + uint32_t error_data_length, + QemuUUID fru_id, + uint64_t time_stamp) { const uint8_t fru_text[20] = {0}; -/* Section Type */ g_array_append_vals(table, section_type, 16); - -/* Error Severity */ build_append_int_noprefix(table, error_severity, 4); + /* Revision */ build_append_int_noprefix(table, 0x300, 2); -/* Validation Bits */ + build_append_int_noprefix(table, validation_bits, 1); -/* Flags */ build_append_int_noprefix(table, flags, 1); -/* Error Data Length */ build_append_int_noprefix(table, error_data_length, 4); -/* FRU Id */ g_array_append_vals(table, fru_id.data, ARRAY_SIZE(fru_id.data)); - -/* FRU Text */ g_array_append_vals(table, fru_text, sizeof(fru_text)); - -/* Timestamp */ build_append_int_noprefix(table, time_stamp, 8); } @@ -165,19 +159,17 @@ static void acpi_ghes_generic_error_data(GArray *table, * Generic Error Status Block * ACPI 6.1: 18.3.2.7.1 Generic Error Data */ -static void acpi_ghes_generic_error_status(GArray *table, uint32_t block_status, -uint32_t raw_data_offset, uint32_t raw_data_length, -uint32_t data_length, uint32_t error_severity) +static void acpi_ghes_generic_error_status(GArray *table, + uint32_t block_status, + uint32_t raw_data_offset, + uint32_t raw_data_length, + uint32_t data_length, + uint32_t error_severity) { -/* Block Status */ build_append_int_noprefix(table, block_status, 4); -/* Raw Data Offset */ build_append_int_noprefix(table, raw_data_offset, 4); -/* Raw Data Length */ build_append_int_noprefix(table, raw_data_length, 4); -/* Data Length */ build_append_int_noprefix(table, data_length, 4); -/* Error Severity */ build_append_int_noprefix(table, error_severity, 4); } -- 2.46.0
[PATCH v8 09/13] scripts/ghes_inject: add a script to generate GHES error inject
Using the QMP GHESv2 API requires preparing a raw data array containing a CPER record. Add a helper script with subcommands to prepare such data. Currently, only ARM Processor error CPER record is supported. Signed-off-by: Mauro Carvalho Chehab --- MAINTAINERS| 3 + scripts/arm_processor_error.py | 377 ++ scripts/ghes_inject.py | 51 +++ scripts/qmp_helper.py | 702 + 4 files changed, 1133 insertions(+) create mode 100644 scripts/arm_processor_error.py create mode 100755 scripts/ghes_inject.py create mode 100644 scripts/qmp_helper.py diff --git a/MAINTAINERS b/MAINTAINERS index 1d8091818899..249ed2858198 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2083,6 +2083,9 @@ S: Maintained F: hw/arm/ghes_cper.c F: hw/acpi/ghes_cper_stub.c F: qapi/acpi-hest.json +F: scripts/ghes_inject.py +F: scripts/arm_processor_error.py +F: scripts/qmp_helper.py ppc4xx L: qemu-...@nongnu.org diff --git a/scripts/arm_processor_error.py b/scripts/arm_processor_error.py new file mode 100644 index ..62e0c5662232 --- /dev/null +++ b/scripts/arm_processor_error.py @@ -0,0 +1,377 @@ +#!/usr/bin/env python3 +# +# pylint: disable=C0301,C0114,R0903,R0912,R0913,R0914,R0915,W0511 +# SPDX-License-Identifier: GPL-2.0 +# +# Copyright (C) 2024 Mauro Carvalho Chehab + +# TODO: current implementation has dummy defaults. +# +# For a better implementation, a QMP addition/call is needed to +# retrieve some data for ARM Processor Error injection: +# +# - ARM registers: power_state, mpidr. + +import argparse +import re + +from qmp_helper import qmp, util, cper_guid + +class ArmProcessorEinj: +""" +Implements ARM Processor Error injection via GHES +""" + +DESC = """ +Generates an ARM processor error CPER, compatible with +UEFI 2.9A Errata. +""" + +ACPI_GHES_ARM_CPER_LENGTH = 40 +ACPI_GHES_ARM_CPER_PEI_LENGTH = 32 + +# Context types +CONTEXT_AARCH32_EL1 = 1 +CONTEXT_AARCH64_EL1 = 5 +CONTEXT_MISC_REG = 8 + +def __init__(self, subparsers): +"""Initialize the error injection class and add subparser""" + +# Valid choice values +self.arm_valid_bits = { +"mpidr":util.bit(0), +"affinity": util.bit(1), +"running": util.bit(2), +"vendor": util.bit(3), +} + +self.pei_flags = { +"first":util.bit(0), +"last": util.bit(1), +"propagated": util.bit(2), +"overflow": util.bit(3), +} + +self.pei_error_types = { +"cache":util.bit(1), +"tlb": util.bit(2), +"bus": util.bit(3), +"micro-arch": util.bit(4), +} + +self.pei_valid_bits = { +"multiple-error": util.bit(0), +"flags":util.bit(1), +"error-info": util.bit(2), +"virt-addr":util.bit(3), +"phy-addr": util.bit(4), +} + +self.data = bytearray() + +parser = subparsers.add_parser("arm", description=self.DESC) + +arm_valid_bits = ",".join(self.arm_valid_bits.keys()) +flags = ",".join(self.pei_flags.keys()) +error_types = ",".join(self.pei_error_types.keys()) +pei_valid_bits = ",".join(self.pei_valid_bits.keys()) + +# UEFI N.16 ARM Validation bits +g_arm = parser.add_argument_group("ARM processor") +g_arm.add_argument("--arm", "--arm-valid", + help=f"ARM valid bits: {arm_valid_bits}") +g_arm.add_argument("-a", "--affinity", "--level", "--affinity-level", + type=lambda x: int(x, 0), + help="Affinity level (when multiple levels apply)") +g_arm.add_argument("-l", "--mpidr", type=lambda x: int(x, 0), + help="Multiprocessor Affinity Register") +g_arm.add_argument("-i", "--midr", type=lambda x: int(x, 0), + help="Main ID Register") +g_arm.add_argument("-r", "--running", + action=argparse.BooleanOptionalAction, + default=None, + help="Indicates if the processor is running or not") +g_arm.add_argument("--psci", "--psci-state", + type=lambda x: int(x,
[PATCH v8 07/13] acpi/ghes: cleanup the memory error code logic
Better organize the code of the function, making it to use the raw CPER function, thus removing duplicated code. While here, rename the function to actually reflect what it does. Signed-off-by: Mauro Carvalho Chehab --- hw/acpi/ghes-stub.c| 2 +- hw/acpi/ghes.c | 125 +++-- include/hw/acpi/ghes.h | 4 +- target/arm/kvm.c | 2 +- 4 files changed, 50 insertions(+), 83 deletions(-) diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c index 8762449870b5..a60ae07a8e7c 100644 --- a/hw/acpi/ghes-stub.c +++ b/hw/acpi/ghes-stub.c @@ -11,7 +11,7 @@ #include "qemu/osdep.h" #include "hw/acpi/ghes.h" -int acpi_ghes_record_errors(enum AcpiGhesNotifyType notify, +int acpi_ghes_memory_errors(enum AcpiGhesNotifyType notify, uint64_t physical_address) { return -1; diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index a3ae710dcf81..4f7b6c5ad2b6 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -206,51 +206,30 @@ static void acpi_ghes_build_append_mem_cper(GArray *table, build_append_int_noprefix(table, 0, 7); } -static int acpi_ghes_record_mem_error(uint64_t error_block_address, - uint64_t error_physical_addr) +static void +ghes_gen_err_data_uncorrectable_recoverable(GArray *block, +const uint8_t *section_type, +int data_length) { -GArray *block; - -/* Memory Error Section Type */ -const uint8_t uefi_cper_mem_sec[] = - UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \ - 0xED, 0x7C, 0x83, 0xB1); - /* invalid fru id: ACPI 4.0: 17.3.2.6.1 Generic Error Data, * Table 17-13 Generic Error Data Entry */ QemuUUID fru_id = {}; -uint32_t data_length; -block = g_array_new(false, true /* clear */, 1); - -/* This is the length if adding a new generic error data entry*/ -data_length = ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH; /* - * It should not run out of the preallocated memory if adding a new generic - * error data entry + * Calculate the size with this block. No need to check for + * too big CPER, as CPER size is checked at ghes_record_cper_errors() */ -assert((data_length + ACPI_GHES_GESB_SIZE) <= -ACPI_GHES_MAX_RAW_DATA_LENGTH); +data_length += ACPI_GHES_GESB_SIZE; /* Build the new generic error status block header */ acpi_ghes_generic_error_status(block, ACPI_GEBS_UNCORRECTABLE, 0, 0, data_length, ACPI_CPER_SEV_RECOVERABLE); /* Build this new generic error data entry header */ -acpi_ghes_generic_error_data(block, uefi_cper_mem_sec, +acpi_ghes_generic_error_data(block, section_type, ACPI_CPER_SEV_RECOVERABLE, 0, 0, ACPI_GHES_MEM_CPER_LENGTH, fru_id, 0); - -/* Build the memory section CPER for above new generic error data entry */ -acpi_ghes_build_append_mem_cper(block, error_physical_addr); - -/* Write the generic error data entry into guest memory */ -cpu_physical_memory_write(error_block_address, block->data, block->len); - -g_array_free(block, true); - -return 0; } /* @@ -448,59 +427,10 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s, ags->present = true; } -int acpi_ghes_record_errors(enum AcpiGhesNotifyType notify, -uint64_t physical_address) -{ -uint64_t cper_addr, read_ack_register = 0; -uint64_t read_ack_start_addr; -enum AcpiHestSourceId source; -AcpiGedState *acpi_ged_state; -AcpiGhesState *ags; - -if (ghes_notify_to_source_id(ACPI_HEST_SRC_ID_SEA, &source)) { -error_report("GHES: Invalid error block/ack address(es) for notify %d", - notify); -return -1; -} - -acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED, - NULL)); -g_assert(acpi_ged_state); -ags = &acpi_ged_state->ghes_state; - -cper_addr = le64_to_cpu(ags->ghes_addr_le); -cper_addr += ACPI_HEST_SRC_ID_COUNT * sizeof(uint64_t); -read_ack_start_addr = cper_addr + source * sizeof(uint64_t); - -cper_addr += ACPI_HEST_SRC_ID_COUNT * sizeof(uint64_t); -cper_addr += source * ACPI_GHES_MAX_RAW_DATA_LENGTH; - -if (!physical_address) { -error_report("can not find Generic Error Status Block for notify %d", - notify); -return -1; -} - -cpu_physical_memory_read(read_ack_start_addr, - &read_ack_register, sizeof(read_ack_register)); - -/* zero means OSPM does not acknowledge the error */ - -read_ack_register = cpu_to_le64(0); -/* - * Clear the Read Ack Register, OSPM will write it to 1 when - * it ackn
[PATCH v8 02/13] arm/virt: Wire up a GED error device for ACPI / GHES
Adds support to ARM virtualization to allow handling generic error ACPI Event via GED & error source device. It is aligned with Linux Kernel patch: https://lore.kernel.org/lkml/1272350481-27951-8-git-send-email-ying.hu...@intel.com/ Co-authored-by: Mauro Carvalho Chehab Co-authored-by: Jonathan Cameron Signed-off-by: Jonathan Cameron Signed-off-by: Mauro Carvalho Chehab Acked-by: Igor Mammedov --- hw/acpi/ghes.c | 3 +++ hw/arm/virt-acpi-build.c | 1 + hw/arm/virt.c| 12 +++- include/hw/acpi/ghes.h | 3 +++ include/hw/arm/virt.h| 1 + 5 files changed, 19 insertions(+), 1 deletion(-) diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index e9511d9b8f71..13b105c5d02d 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -444,6 +444,9 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address) return ret; } +NotifierList acpi_generic_error_notifiers = +NOTIFIER_LIST_INITIALIZER(error_device_notifiers); + bool acpi_ghes_present(void) { AcpiGedState *acpi_ged_state; diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index f76fb117adff..1769467d23b2 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -858,6 +858,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) } acpi_dsdt_add_power_button(scope); +aml_append(scope, aml_error_device()); #ifdef CONFIG_TPM acpi_dsdt_add_tpm(scope, vms); #endif diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 687fe0bb8bc9..22448e5c5b73 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -677,7 +677,7 @@ static inline DeviceState *create_acpi_ged(VirtMachineState *vms) DeviceState *dev; MachineState *ms = MACHINE(vms); int irq = vms->irqmap[VIRT_ACPI_GED]; -uint32_t event = ACPI_GED_PWR_DOWN_EVT; +uint32_t event = ACPI_GED_PWR_DOWN_EVT | ACPI_GED_ERROR_EVT; if (ms->ram_slots) { event |= ACPI_GED_MEM_HOTPLUG_EVT; @@ -1009,6 +1009,13 @@ static void virt_powerdown_req(Notifier *n, void *opaque) } } +static void virt_generic_error_req(Notifier *n, void *opaque) +{ +VirtMachineState *s = container_of(n, VirtMachineState, generic_error_notifier); + +acpi_send_event(s->acpi_dev, ACPI_GENERIC_ERROR); +} + static void create_gpio_keys(char *fdt, DeviceState *pl061_dev, uint32_t phandle) { @@ -2385,6 +2392,9 @@ static void machvirt_init(MachineState *machine) if (has_ged && aarch64 && firmware_loaded && virt_is_acpi_enabled(vms)) { vms->acpi_dev = create_acpi_ged(vms); +vms->generic_error_notifier.notify = virt_generic_error_req; +notifier_list_add(&acpi_generic_error_notifiers, + &vms->generic_error_notifier); } else { create_gpio_devices(vms, VIRT_GPIO, sysmem); } diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h index 674f6958e905..fb80897e7eac 100644 --- a/include/hw/acpi/ghes.h +++ b/include/hw/acpi/ghes.h @@ -23,6 +23,9 @@ #define ACPI_GHES_H #include "hw/acpi/bios-linker-loader.h" +#include "qemu/notify.h" + +extern NotifierList acpi_generic_error_notifiers; /* * Values for Hardware Error Notification Type field diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h index a4d937ed45ac..ad9f6e94dcc5 100644 --- a/include/hw/arm/virt.h +++ b/include/hw/arm/virt.h @@ -175,6 +175,7 @@ struct VirtMachineState { DeviceState *gic; DeviceState *acpi_dev; Notifier powerdown_notifier; +Notifier generic_error_notifier; PCIBus *bus; char *oem_id; char *oem_table_id; -- 2.46.0
[PATCH v8 01/13] acpi/generic_event_device: add an APEI error device
Adds a generic error device to handle generic hardware error events as specified at ACPI 6.5 specification at 18.3.2.7.2: https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#event-notification-for-generic-error-sources using HID PNP0C33. The PNP0C33 device is used to report hardware errors to the guest via ACPI APEI Generic Hardware Error Source (GHES). Co-authored-by: Mauro Carvalho Chehab Co-authored-by: Jonathan Cameron Signed-off-by: Jonathan Cameron Signed-off-by: Mauro Carvalho Chehab Reviewed-by: Igor Mammedov --- hw/acpi/aml-build.c| 10 ++ hw/acpi/generic_event_device.c | 8 include/hw/acpi/acpi_dev_interface.h | 1 + include/hw/acpi/aml-build.h| 2 ++ include/hw/acpi/generic_event_device.h | 1 + 5 files changed, 22 insertions(+) diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c index 6d4517cfbe3d..cb167523859f 100644 --- a/hw/acpi/aml-build.c +++ b/hw/acpi/aml-build.c @@ -2520,3 +2520,13 @@ Aml *aml_i2c_serial_bus_device(uint16_t address, const char *resource_source) return var; } + +/* ACPI 5.0: 18.3.2.6.2 Event Notification For Generic Error Sources */ +Aml *aml_error_device(void) +{ +Aml *dev = aml_device(ACPI_APEI_ERROR_DEVICE); +aml_append(dev, aml_name_decl("_HID", aml_string("PNP0C33"))); +aml_append(dev, aml_name_decl("_UID", aml_int(0))); + +return dev; +} diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c index 15b4c3ebbf24..b4c83a089a02 100644 --- a/hw/acpi/generic_event_device.c +++ b/hw/acpi/generic_event_device.c @@ -26,6 +26,7 @@ static const uint32_t ged_supported_events[] = { ACPI_GED_PWR_DOWN_EVT, ACPI_GED_NVDIMM_HOTPLUG_EVT, ACPI_GED_CPU_HOTPLUG_EVT, +ACPI_GED_ERROR_EVT, }; /* @@ -116,6 +117,11 @@ void build_ged_aml(Aml *table, const char *name, HotplugHandler *hotplug_dev, aml_notify(aml_name(ACPI_POWER_BUTTON_DEVICE), aml_int(0x80))); break; +case ACPI_GED_ERROR_EVT: +aml_append(if_ctx, + aml_notify(aml_name(ACPI_APEI_ERROR_DEVICE), + aml_int(0x80))); +break; case ACPI_GED_NVDIMM_HOTPLUG_EVT: aml_append(if_ctx, aml_notify(aml_name("\\_SB.NVDR"), @@ -295,6 +301,8 @@ static void acpi_ged_send_event(AcpiDeviceIf *adev, AcpiEventStatusBits ev) sel = ACPI_GED_MEM_HOTPLUG_EVT; } else if (ev & ACPI_POWER_DOWN_STATUS) { sel = ACPI_GED_PWR_DOWN_EVT; +} else if (ev & ACPI_GENERIC_ERROR) { +sel = ACPI_GED_ERROR_EVT; } else if (ev & ACPI_NVDIMM_HOTPLUG_STATUS) { sel = ACPI_GED_NVDIMM_HOTPLUG_EVT; } else if (ev & ACPI_CPU_HOTPLUG_STATUS) { diff --git a/include/hw/acpi/acpi_dev_interface.h b/include/hw/acpi/acpi_dev_interface.h index 68d9d15f50aa..8294f8f0ccca 100644 --- a/include/hw/acpi/acpi_dev_interface.h +++ b/include/hw/acpi/acpi_dev_interface.h @@ -13,6 +13,7 @@ typedef enum { ACPI_NVDIMM_HOTPLUG_STATUS = 16, ACPI_VMGENID_CHANGE_STATUS = 32, ACPI_POWER_DOWN_STATUS = 64, +ACPI_GENERIC_ERROR = 128, } AcpiEventStatusBits; #define TYPE_ACPI_DEVICE_IF "acpi-device-interface" diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h index a3784155cb33..44d1a6af0c69 100644 --- a/include/hw/acpi/aml-build.h +++ b/include/hw/acpi/aml-build.h @@ -252,6 +252,7 @@ struct CrsRangeSet { /* Consumer/Producer */ #define AML_SERIAL_BUS_FLAG_CONSUME_ONLY(1 << 1) +#define ACPI_APEI_ERROR_DEVICE "GEDD" /** * init_aml_allocator: * @@ -382,6 +383,7 @@ Aml *aml_dma(AmlDmaType typ, AmlDmaBusMaster bm, AmlTransferSize sz, uint8_t channel); Aml *aml_sleep(uint64_t msec); Aml *aml_i2c_serial_bus_device(uint16_t address, const char *resource_source); +Aml *aml_error_device(void); /* Block AML object primitives */ Aml *aml_scope(const char *name_format, ...) G_GNUC_PRINTF(1, 2); diff --git a/include/hw/acpi/generic_event_device.h b/include/hw/acpi/generic_event_device.h index 40af3550b56d..9ace8fe70328 100644 --- a/include/hw/acpi/generic_event_device.h +++ b/include/hw/acpi/generic_event_device.h @@ -98,6 +98,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(AcpiGedState, ACPI_GED) #define ACPI_GED_PWR_DOWN_EVT 0x2 #define ACPI_GED_NVDIMM_HOTPLUG_EVT 0x4 #define ACPI_GED_CPU_HOTPLUG_EVT0x8 +#define ACPI_GED_ERROR_EVT 0x10 typedef struct GEDState { MemoryRegion evt; -- 2.46.0
[PATCH v8 10/13] target/arm: add an experimental mpidr arm cpu property object
Accurately injecting an ARM Processor error ACPI/APEI GHES error record requires the value of the ARM Multiprocessor Affinity Register (mpidr). While ARM implements it, this is currently not visible. Add a field at CPU storing it, and place it at arm_cpu_properties as experimental, thus allowing it to be queried via QMP using qom-get function. Signed-off-by: Mauro Carvalho Chehab --- target/arm/cpu.c| 1 + target/arm/cpu.h| 1 + target/arm/helper.c | 10 -- 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/target/arm/cpu.c b/target/arm/cpu.c index 19191c239181..30fcf0a10f46 100644 --- a/target/arm/cpu.c +++ b/target/arm/cpu.c @@ -2619,6 +2619,7 @@ static ObjectClass *arm_cpu_class_by_name(const char *cpu_model) static Property arm_cpu_properties[] = { DEFINE_PROP_UINT64("midr", ARMCPU, midr, 0), +DEFINE_PROP_UINT64("x-mpidr", ARMCPU, mpidr, 0), DEFINE_PROP_UINT64("mp-affinity", ARMCPU, mp_affinity, ARM64_AFFINITY_INVALID), DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID), diff --git a/target/arm/cpu.h b/target/arm/cpu.h index 9a3fd595621f..3ad4de793409 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -1033,6 +1033,7 @@ struct ArchCPU { uint64_t reset_pmcr_el0; } isar; uint64_t midr; +uint64_t mpidr; uint32_t revidr; uint32_t reset_fpsid; uint64_t ctr; diff --git a/target/arm/helper.c b/target/arm/helper.c index 0a582c1cd3b3..d6e7aa069489 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -4690,7 +4690,7 @@ static uint64_t mpidr_read_val(CPUARMState *env) return mpidr; } -static uint64_t mpidr_read(CPUARMState *env, const ARMCPRegInfo *ri) +static uint64_t mpidr_read(CPUARMState *env) { unsigned int cur_el = arm_current_el(env); @@ -4700,6 +4700,11 @@ static uint64_t mpidr_read(CPUARMState *env, const ARMCPRegInfo *ri) return mpidr_read_val(env); } +static uint64_t mpidr_read_ri(CPUARMState *env, const ARMCPRegInfo *ri) +{ +return mpidr_read(env); +} + static const ARMCPRegInfo lpae_cp_reginfo[] = { /* NOP AMAIR0/1 */ { .name = "AMAIR0", .state = ARM_CP_STATE_BOTH, @@ -9721,7 +9726,7 @@ void register_cp_regs_for_features(ARMCPU *cpu) { .name = "MPIDR_EL1", .state = ARM_CP_STATE_BOTH, .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 5, .fgt = FGT_MPIDR_EL1, - .access = PL1_R, .readfn = mpidr_read, .type = ARM_CP_NO_RAW }, + .access = PL1_R, .readfn = mpidr_read_ri, .type = ARM_CP_NO_RAW }, }; #ifdef CONFIG_USER_ONLY static const ARMCPRegUserSpaceInfo mpidr_user_cp_reginfo[] = { @@ -9731,6 +9736,7 @@ void register_cp_regs_for_features(ARMCPU *cpu) modify_arm_cp_regs(mpidr_cp_reginfo, mpidr_user_cp_reginfo); #endif define_arm_cp_regs(cpu, mpidr_cp_reginfo); +cpu->mpidr = mpidr_read(env); } if (arm_feature(env, ARM_FEATURE_AUXCR)) { -- 2.46.0
[PATCH v8 13/13] acpi/ghes: check if the BIOS pointers for HEST are correct
The OS kernels navigate between HEST, error source struct and CPER by the usage of some pointers. Double-check if such pointers were properly initializing, ensuring that they match the right address for CPER. Signed-off-by: Mauro Carvalho Chehab --- hw/acpi/ghes.c | 30 +- 1 file changed, 29 insertions(+), 1 deletion(-) diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index a822a5eafaa0..51e2e40e5a9c 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -85,6 +85,9 @@ enum AcpiHestSourceId { #define HEST_GHES_V2_TABLE_SIZE 92 #define GHES_ACK_OFFSET (64 + GAS_ADDR_OFFSET + ACPI_HEST_HEADER_SIZE) +/* ACPI 6.2: 18.3.2.7: Generic Hardware Error Source */ +#define GHES_ERR_ST_ADDR_OFFSET (20 + GAS_ADDR_OFFSET + ACPI_HEST_HEADER_SIZE) + /* * Values for error_severity field */ @@ -425,7 +428,10 @@ NotifierList acpi_generic_error_notifiers = void ghes_record_cper_errors(const void *cper, size_t len, enum AcpiGhesNotifyType notify, Error **errp) { -uint64_t cper_addr, read_ack_start_addr; +uint64_t hest_read_ack_start_addr, read_ack_start_addr; +uint64_t read_ack_start_addr_2, err_source_struct; +uint64_t hest_err_block_addr, error_block_addr; +uint64_t cper_addr, cper_addr_2; enum AcpiHestSourceId source; AcpiGedState *acpi_ged_state; AcpiGhesState *ags; @@ -450,6 +456,28 @@ void ghes_record_cper_errors(const void *cper, size_t len, cper_addr += ACPI_HEST_SRC_ID_COUNT * sizeof(uint64_t); cper_addr += source * ACPI_GHES_MAX_RAW_DATA_LENGTH; +err_source_struct = le64_to_cpu(ags->hest_addr_le) + +source * HEST_GHES_V2_TABLE_SIZE; + +/* Check if BIOS addr pointers were properly generated */ + +hest_err_block_addr = err_source_struct + GHES_ERR_ST_ADDR_OFFSET; +hest_read_ack_start_addr = err_source_struct + GHES_ACK_OFFSET; + +cpu_physical_memory_read(hest_err_block_addr, &error_block_addr, + sizeof(error_block_addr)); + +cpu_physical_memory_read(error_block_addr, &cper_addr_2, + sizeof(error_block_addr)); + +cpu_physical_memory_read(hest_read_ack_start_addr, &read_ack_start_addr_2, +sizeof(read_ack_start_addr_2)); + +assert(cper_addr == cper_addr_2); +assert(read_ack_start_addr == read_ack_start_addr_2); + +/* Update ACK offset to notify about a new error */ + cpu_physical_memory_read(read_ack_start_addr, &read_ack, sizeof(uint64_t)); -- 2.46.0
[PATCH v8 06/13] acpi/ghes: add support for generic error injection via QAPI
Provide a generic interface for error injection via GHESv2. This patch is co-authored: - original ghes logic to inject a simple ARM record by Shiju Jose; - generic logic to handle block addresses by Jonathan Cameron; - generic GHESv2 error inject by Mauro Carvalho Chehab; Co-authored-by: Jonathan Cameron Co-authored-by: Shiju Jose Co-authored-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Cameron Signed-off-by: Shiju Jose Signed-off-by: Mauro Carvalho Chehab --- hw/acpi/ghes.c | 57 + hw/acpi/ghes_cper.c | 2 +- 2 files changed, 58 insertions(+), 1 deletion(-) diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index 7870f51e2a9e..a3ae710dcf81 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -500,6 +500,63 @@ int acpi_ghes_record_errors(enum AcpiGhesNotifyType notify, NotifierList acpi_generic_error_notifiers = NOTIFIER_LIST_INITIALIZER(error_device_notifiers); +void ghes_record_cper_errors(uint8_t *cper, size_t len, + enum AcpiGhesNotifyType notify, Error **errp) +{ +uint64_t cper_addr, read_ack_start_addr; +enum AcpiHestSourceId source; +AcpiGedState *acpi_ged_state; +AcpiGhesState *ags; +uint64_t read_ack; + +if (ghes_notify_to_source_id(notify, &source)) { +error_setg(errp, + "GHES: Invalid error block/ack address(es) for notify %d", + notify); +return; +} + +acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED, + NULL)); +g_assert(acpi_ged_state); +ags = &acpi_ged_state->ghes_state; + +cper_addr = le64_to_cpu(ags->ghes_addr_le); +cper_addr += ACPI_HEST_SRC_ID_COUNT * sizeof(uint64_t); +read_ack_start_addr = cper_addr + source * sizeof(uint64_t); + +cper_addr += ACPI_HEST_SRC_ID_COUNT * sizeof(uint64_t); +cper_addr += source * ACPI_GHES_MAX_RAW_DATA_LENGTH; + +cpu_physical_memory_read(read_ack_start_addr, + &read_ack, sizeof(uint64_t)); + +/* zero means OSPM does not acknowledge the error */ +if (!read_ack) { +error_setg(errp, + "Last CPER record was not acknowledged yet"); +read_ack = 1; +cpu_physical_memory_write(read_ack_start_addr, + &read_ack, sizeof(uint64_t)); +return; +} + +read_ack = cpu_to_le64(0); +cpu_physical_memory_write(read_ack_start_addr, + &read_ack, sizeof(uint64_t)); + +/* Build CPER record */ + +if (len > ACPI_GHES_MAX_RAW_DATA_LENGTH) { +error_setg(errp, "GHES CPER record is too big: %ld", len); +} + +/* Write the generic error data entry into guest memory */ +cpu_physical_memory_write(cper_addr, cper, len); + +notifier_list_notify(&acpi_generic_error_notifiers, NULL); +} + bool acpi_ghes_present(void) { AcpiGedState *acpi_ged_state; diff --git a/hw/acpi/ghes_cper.c b/hw/acpi/ghes_cper.c index 92ca84d738de..2328dbff7012 100644 --- a/hw/acpi/ghes_cper.c +++ b/hw/acpi/ghes_cper.c @@ -29,5 +29,5 @@ void qmp_ghes_cper(const char *qmp_cper, return; } -/* TODO: call a function at ghes */ +ghes_record_cper_errors(cper, len, ACPI_GHES_NOTIFY_GPIO, errp); } -- 2.46.0
Re: [PATCH v8 03/13] acpi/ghes: Add support for GED error device
Em Mon, 19 Aug 2024 13:43:04 +0200 Igor Mammedov escreveu: > On Fri, 16 Aug 2024 09:37:35 +0200 > Mauro Carvalho Chehab wrote: > > > From: Jonathan Cameron > > > > As a GED error device is now defined, add another type > > of notification. > > > > Add error notification to GHES v2 using > >a GED error device GED triggered via interrupt. > > This is hard to parse, perhaps update so it would be > more clear what does what > > > > > [mchehab: do some cleanups at ACPI_HEST_SRC_ID_* checks and > > rename HEST event to better identify GED interrupt OSPM] > > > > Signed-off-by: Jonathan Cameron > > Signed-off-by: Mauro Carvalho Chehab > > Reviewed-by: Igor Mammedov > > --- > > in addition to change log in cover letter, > I'd suggest to keep per patch change log as well (after ---), > it helps reviewer to notice intended changes. > > > [...] > > +case ACPI_HEST_SRC_ID_GED: > > +build_ghes_hw_error_notification(table_data, > > ACPI_GHES_NOTIFY_GPIO); > While GPIO works for arm, it's not the case for other machines. > I recall a suggestion to use ACPI_GHES_NOTIFY_EXTERNAL instead of GPIO one, > but that got lost somewhere... True, but the same also applies to SEA, which is ARMv8+. After having everything in place, I confined the source ID into this code inside ghes.c: enum AcpiHestSourceId { ACPI_HEST_SRC_ID_SEA, ACPI_HEST_SRC_ID_GED, /* Shall be the last one */ ACPI_HEST_SRC_ID_COUNT } AcpiHestSourceId; static bool ghes_notify_to_source_id(enum AcpiGhesNotifyType notify, enum AcpiHestSourceId *source_id) { switch (notify) { case ACPI_GHES_NOTIFY_SEA: /* ARMv8 */ *source_id = ACPI_HEST_SRC_ID_SEA; return false; case ACPI_GHES_NOTIFY_GPIO: *source_id = ACPI_HEST_SRC_ID_GED; return false; default: /* Unsupported notification types */ return true; } } The only place where the source ID number is used is at ghes_notify_to_source_id() - still we use ACPI_HEST_SRC_ID_COUNT on other places to initialize and fill in the HEST table and its error source structures. On other words, the source ID field is filled from the notification types as defined at include/hw/acpi/ghes.h: ACPI_GHES_NOTIFY_POLLED = 0, ACPI_GHES_NOTIFY_EXTERNAL = 1, ACPI_GHES_NOTIFY_LOCAL = 2, ACPI_GHES_NOTIFY_SCI = 3, ACPI_GHES_NOTIFY_NMI = 4, ACPI_GHES_NOTIFY_CMCI = 5, ACPI_GHES_NOTIFY_MCE = 6, ACPI_GHES_NOTIFY_GPIO = 7, ACPI_GHES_NOTIFY_SEA = 8, ACPI_GHES_NOTIFY_SEI = 9, ACPI_GHES_NOTIFY_GSIV = 10, ACPI_GHES_NOTIFY_SDEI = 11, (please notice that ACPI already defines "EXTERNAL" as being something else) Now, if we want to add support for x86, we could either add some ifdefs inside ghes.c, e. g. something like: enum AcpiHestSourceId { #ifdef TARGET_ARM ACPI_HEST_SRC_ID_SEA, ACPI_HEST_SRC_ID_GED, #endif #ifdef TARGET_I386 ACPI_HEST_SRC_ID_MCE, #endif /* Shall be the last one */ ACPI_HEST_SRC_ID_COUNT } AcpiHestSourceId; and something similar at ghes_notify_to_source_id(): static bool ghes_notify_to_source_id(enum AcpiGhesNotifyType notify, enum AcpiHestSourceId *source_id) { switch (notify) { #ifdef TARGET_ARM case ACPI_GHES_NOTIFY_SEA: /* ARMv8 */ *source_id = ACPI_HEST_SRC_ID_SEA; return false; case ACPI_GHES_NOTIFY_GPIO: *source_id = ACPI_HEST_SRC_ID_GED; return false; #endif #ifdef TARGET_I386 case ACPI_GHES_NOTIFY_MCE: *source_id = ACPI_HEST_SRC_ID_MCE; return false; #endif default: /* Unsupported notification types */ return true; } } An alternative would be to move source id/notification code out, placing them at hw/arm, hw/i386, but a more complex binding logic will be needed. If we're willing to do something like that, I would prefer to not do such redesign now. Better to do such change when we'll be ready to add some notification support that works on x86 (MCE? SCI? NMI?). Regards, Mauro
Re: [PATCH v8 13/13] acpi/ghes: check if the BIOS pointers for HEST are correct
Em Mon, 19 Aug 2024 16:07:33 +0200 Igor Mammedov escreveu: > > +err_source_struct = le64_to_cpu(ags->hest_addr_le) + > > +source * HEST_GHES_V2_TABLE_SIZE; > > there is no guaranties that HEST table will contain only GHESv2 sources, > and once such is added this place becomes broken. > > we need to iterate over HEST taking that into account > and find only ghesv2 structure with source id of interest. > > This function (and acpi_ghes_record_errors() as well) taking source_id > as input should be able to lookup pointers from HEST in guest RAM, > very crude idea could look something like this: > > typedef struct hest_source_type2len{ >uint16_t type >int len > } hest_structure_type2len > > hest_structure_type2len supported_hest_sources[] = { > /* Table 18-344 Generic Hardware Error Source version 2 (GHESv2) > Structure */ > {.type = 10, .len = 92}, > } Sounds interesting, but IMO it should be done only when other types besides ghes would be added, as: 1. Right now, the file is acpi/ghes.c. Adding non-type 10 HEST structures there would be a little weird. It should likely be renamed to acpi/hest.c when such time comes. 2. ACPI 6.5 has made clear that the above will only work up to type 11, as, from type 12 and above, the length will be added to the error struct, according with: https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#error-source-structure-header-type-12-onward 3. some types have variable size. Starting from the beginning, type 0, as defined at: https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#hardware-errors-and-error-sources has: size = 40 + 24 * Number of Hardware banks So, a simple table like the above with fixed sizes won't work. The code would need instead a switch if types are <= 11. Adding proper support for all already defined 12 types sounds lots of work, as the code would need to calculate the size depending on the size, and we don't really initialize the HEST table with other types but GHES. Ok, we could still do something like this pseudo-code to get the error source offset: #define ACPI_HEST_TYPE_GHESV2 11 err_struct_offset = 0; for (i = 0; i < source_id_count; i++) { /* NOTE: Other types may have different sizes */ assert(ghes[i].type == ACPI_HEST_TYPE_GHESV2); if (ghes[i].source_id == source_id) break; err_struct_offset += HEST_GHES_V2_TABLE_SIZE; } assert (i < source_id_count); --- That's said, maybe this will just add unwanted complexity, as QEMU is already setting those offsets via bios_linker_loader_add_pointer(). So, an alternative for that is to merge the code on patch 13 with the one on patch 5, dropping the math calcus there and relying that QEMU will always handle properly bios links. See, the logic which constructs GHESv2 source IDs do this to create the links between HEST ACPI table and etc/hardware_errors: with: Per-source ID logic at build_ghes_v2(): address_offset = table_data->len; /* Error Status Address */ build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0, 4 /* QWord access */, 0); bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE, address_offset + GAS_ADDR_OFFSET, sizeof(uint64_t), ACPI_HW_ERROR_FW_CFG_FILE, source_id * sizeof(uint64_t)); ... /* * Read Ack Register * ACPI 6.1: 18.3.2.8 Generic Hardware Error Source * version 2 (GHESv2 - Type 10) */ address_offset = table_data->len; build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0, 4 /* QWord access */, 0); bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE, address_offset + GAS_ADDR_OFFSET, sizeof(uint64_t), ACPI_HW_ERROR_FW_CFG_FILE, (ACPI_HEST_SRC_ID_COUNT + source_id) * sizeof(uint64_t)); HEST table creation logic inside build_ghes_error_table(): for (i = 0; i < ACPI_HEST_SRC_ID_COUNT; i++) { /* * Tell firmware to patch error_block_address entries to point to * corresponding "Generic Error Status Block" */ bios_linker_loader_add_pointer(linker, ACPI_HW_ERROR_FW_CFG_FILE, sizeof(uint64_t) * i, sizeof(uint64_t), ACPI_HW_ERROR_FW_CFG_FILE, error_status_block_offset + i * ACPI_GHES_MAX_RAW_DATA_LENGTH); } Using those, the location of the CPER and ack addresses is easy and won't require any math: /* GHESv2 CPER offset */ cpu_physical_memory_read(hest_err_block_addr, &error_block_addr,
Re: [PATCH v8 05/13] acpi/ghes: rework the logic to handle HEST source ID
Em Mon, 19 Aug 2024 14:10:37 +0200 Igor Mammedov escreveu: > On Fri, 16 Aug 2024 09:37:37 +0200 > Mauro Carvalho Chehab wrote: > > > The current logic is based on a lot of duct tape, with > > offsets calculated based on one define with the number of > > source IDs and an enum. > > > > Rewrite the logic in a way that it would be more resilient > > of code changes, by moving the source ID count to an enum > > and make the offset calculus more explicit. > > > > Such change was inspired on a patch from Jonathan Cameron > > splitting the logic to get the CPER address on a separate > > function, as this will be needed to support generic error > > injection. > > patch does too many things, that it's hard to review. > Please split it up on smaller distinct parts, with more specific > commit messages. (see some comments below) True, but there's not much that can be done when doing it and still keeping the code working. I'll split the renames. > > > > Signed-off-by: Mauro Carvalho Chehab > > --- > > hw/acpi/ghes-stub.c | 3 +- > > hw/acpi/ghes.c | 210 --- > > hw/arm/virt-acpi-build.c | 5 +- > > include/hw/acpi/ghes.h | 17 ++-- > > 4 files changed, 138 insertions(+), 97 deletions(-) > > > > diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c > > index c315de1802d6..8762449870b5 100644 > > --- a/hw/acpi/ghes-stub.c > > +++ b/hw/acpi/ghes-stub.c > > @@ -11,7 +11,8 @@ > > #include "qemu/osdep.h" > > #include "hw/acpi/ghes.h" > > > > -int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address) > > +int acpi_ghes_record_errors(enum AcpiGhesNotifyType notify, > > +uint64_t physical_address) > > { > > return -1; > > } > > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c > > index df59fd35568c..7870f51e2a9e 100644 > > --- a/hw/acpi/ghes.c > > +++ b/hw/acpi/ghes.c > > @@ -28,14 +28,23 @@ > > #include "hw/nvram/fw_cfg.h" > > #include "qemu/uuid.h" > > > > -#define ACPI_GHES_ERRORS_FW_CFG_FILE"etc/hardware_errors" > > -#define ACPI_GHES_DATA_ADDR_FW_CFG_FILE "etc/hardware_errors_addr" > > +#define ACPI_HW_ERROR_FW_CFG_FILE "etc/hardware_errors" > > +#define ACPI_HW_ERROR_ADDR_FW_CFG_FILE "etc/hardware_errors_addr" > split out renaming part into a presiding separate patch, > so it won't mask a new code > > > +#define ACPI_HEST_ADDR_FW_CFG_FILE "etc/acpi_table_hest_addr" > > > > /* The max size in bytes for one error block */ > > #define ACPI_GHES_MAX_RAW_DATA_LENGTH (1 * KiB) > > > > > > -/* Support ARMv8 SEA notification type error source and GPIO interrupt. */ > > -#define ACPI_GHES_ERROR_SOURCE_COUNT2 > > +/* > > + * ID numbers used to fill HEST source ID field > > + */ > > +enum AcpiHestSourceId { > > +ACPI_HEST_SRC_ID_SEA, > > +ACPI_HEST_SRC_ID_GED, > > + > > +/* Shall be the last one */ > > +ACPI_HEST_SRC_ID_COUNT > > +} AcpiHestSourceId; > > > this rename also should go into its own separate patch. I opted to remove this completely and move it to arm/virt, as this specific set of sources is for ARM. On such split, I ended placing the QMP error injection as the first one, as this is probably the first one that we'll be mapping on x86 and other architectures. This way, the code at ghes.c won't rely on any hardcoded values. They'll be passed at target ACPI table preparation using this function: void acpi_build_hest(GArray *table_data, GArray *hardware_errors, BIOSLinker *linker, const uint16_t * const notify, int num_sources, const char *oem_id, const char *oem_table_id) On arm (at the rework patch, before adding GPIO method), the call to the HEST build table (and etc/hardware_errors init) is done via: static const uint16_t hest_ghes_notify[] = { [ARM_ACPI_HEST_SRC_ID_SEA] = ACPI_GHES_NOTIFY_SEA, }; void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables) { ... if (vms->ras) { acpi_add_table(table_offsets, tables_blob); acpi_build_hest(tables_blob, tables->hardware_errors, tables->linker, hest_ghes_notify, sizeof(hest_ghes_notify), vms->oem_id, vms->oem_table_id); } ... This way, adding support for a new notification
Re: [PATCH v8 06/13] acpi/ghes: add support for generic error injection via QAPI
Em Mon, 19 Aug 2024 14:51:36 +0200 Igor Mammedov escreveu: > > +read_ack = 1; > > +cpu_physical_memory_write(read_ack_start_addr, > > + &read_ack, (uint64_t)); > we don't do this for SEV so, why are you setting it to 1 here? According with: https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#generic-hardware-error-source-version-2-ghesv2-type-10 "These are the steps the OS must take once detecting an error from a particular GHESv2 error source: OSPM detects error (via interrupt/exception or polling the block status) OSPM copies the error status block OSPM clears the block status field of the error status block OSPM acknowledges the error via Read Ack register. For example: OSPM reads the Read Ack register –> X OSPM writes –> (( X & ReadAckPreserve) | ReadAckWrite)" So, basically the guest OS takes some time to detect that an error is raised. When it detects, it needs to mark that the error was handled. IMO, this is needed, independently of the notification mechanism. Regards, Mauro
[PATCH v9 11/12] target/arm: add an experimental mpidr arm cpu property object
Accurately injecting an ARM Processor error ACPI/APEI GHES error record requires the value of the ARM Multiprocessor Affinity Register (mpidr). While ARM implements it, this is currently not visible. Add a field at CPU storing it, and place it at arm_cpu_properties as experimental, thus allowing it to be queried via QMP using qom-get function. Signed-off-by: Mauro Carvalho Chehab --- target/arm/cpu.c| 1 + target/arm/cpu.h| 1 + target/arm/helper.c | 10 -- 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/target/arm/cpu.c b/target/arm/cpu.c index 19191c239181..30fcf0a10f46 100644 --- a/target/arm/cpu.c +++ b/target/arm/cpu.c @@ -2619,6 +2619,7 @@ static ObjectClass *arm_cpu_class_by_name(const char *cpu_model) static Property arm_cpu_properties[] = { DEFINE_PROP_UINT64("midr", ARMCPU, midr, 0), +DEFINE_PROP_UINT64("x-mpidr", ARMCPU, mpidr, 0), DEFINE_PROP_UINT64("mp-affinity", ARMCPU, mp_affinity, ARM64_AFFINITY_INVALID), DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID), diff --git a/target/arm/cpu.h b/target/arm/cpu.h index 9a3fd595621f..3ad4de793409 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -1033,6 +1033,7 @@ struct ArchCPU { uint64_t reset_pmcr_el0; } isar; uint64_t midr; +uint64_t mpidr; uint32_t revidr; uint32_t reset_fpsid; uint64_t ctr; diff --git a/target/arm/helper.c b/target/arm/helper.c index 0a582c1cd3b3..d6e7aa069489 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -4690,7 +4690,7 @@ static uint64_t mpidr_read_val(CPUARMState *env) return mpidr; } -static uint64_t mpidr_read(CPUARMState *env, const ARMCPRegInfo *ri) +static uint64_t mpidr_read(CPUARMState *env) { unsigned int cur_el = arm_current_el(env); @@ -4700,6 +4700,11 @@ static uint64_t mpidr_read(CPUARMState *env, const ARMCPRegInfo *ri) return mpidr_read_val(env); } +static uint64_t mpidr_read_ri(CPUARMState *env, const ARMCPRegInfo *ri) +{ +return mpidr_read(env); +} + static const ARMCPRegInfo lpae_cp_reginfo[] = { /* NOP AMAIR0/1 */ { .name = "AMAIR0", .state = ARM_CP_STATE_BOTH, @@ -9721,7 +9726,7 @@ void register_cp_regs_for_features(ARMCPU *cpu) { .name = "MPIDR_EL1", .state = ARM_CP_STATE_BOTH, .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 5, .fgt = FGT_MPIDR_EL1, - .access = PL1_R, .readfn = mpidr_read, .type = ARM_CP_NO_RAW }, + .access = PL1_R, .readfn = mpidr_read_ri, .type = ARM_CP_NO_RAW }, }; #ifdef CONFIG_USER_ONLY static const ARMCPRegUserSpaceInfo mpidr_user_cp_reginfo[] = { @@ -9731,6 +9736,7 @@ void register_cp_regs_for_features(ARMCPU *cpu) modify_arm_cp_regs(mpidr_cp_reginfo, mpidr_user_cp_reginfo); #endif define_arm_cp_regs(cpu, mpidr_cp_reginfo); +cpu->mpidr = mpidr_read(env); } if (arm_feature(env, ARM_FEATURE_AUXCR)) { -- 2.46.0
[PATCH v9 07/12] arm/virt: Wire up a GED error device for ACPI / GHES
Adds support to ARM virtualization to allow handling generic error ACPI Event via GED & error source device. It is aligned with Linux Kernel patch: https://lore.kernel.org/lkml/1272350481-27951-8-git-send-email-ying.hu...@intel.com/ Co-authored-by: Mauro Carvalho Chehab Co-authored-by: Jonathan Cameron Signed-off-by: Jonathan Cameron Signed-off-by: Mauro Carvalho Chehab Acked-by: Igor Mammedov --- Changes from v8: - Added a call to the function that produces GHES generic records, as this is now added earlier in this series. Signed-off-by: Mauro Carvalho Chehab --- hw/arm/virt-acpi-build.c | 1 + hw/arm/virt.c| 12 +++- include/hw/arm/virt.h| 1 + 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index 39100c2822c2..9c36da3c831f 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -858,6 +858,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) } acpi_dsdt_add_power_button(scope); +aml_append(scope, aml_error_device()); #ifdef CONFIG_TPM acpi_dsdt_add_tpm(scope, vms); #endif diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 687fe0bb8bc9..22448e5c5b73 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -677,7 +677,7 @@ static inline DeviceState *create_acpi_ged(VirtMachineState *vms) DeviceState *dev; MachineState *ms = MACHINE(vms); int irq = vms->irqmap[VIRT_ACPI_GED]; -uint32_t event = ACPI_GED_PWR_DOWN_EVT; +uint32_t event = ACPI_GED_PWR_DOWN_EVT | ACPI_GED_ERROR_EVT; if (ms->ram_slots) { event |= ACPI_GED_MEM_HOTPLUG_EVT; @@ -1009,6 +1009,13 @@ static void virt_powerdown_req(Notifier *n, void *opaque) } } +static void virt_generic_error_req(Notifier *n, void *opaque) +{ +VirtMachineState *s = container_of(n, VirtMachineState, generic_error_notifier); + +acpi_send_event(s->acpi_dev, ACPI_GENERIC_ERROR); +} + static void create_gpio_keys(char *fdt, DeviceState *pl061_dev, uint32_t phandle) { @@ -2385,6 +2392,9 @@ static void machvirt_init(MachineState *machine) if (has_ged && aarch64 && firmware_loaded && virt_is_acpi_enabled(vms)) { vms->acpi_dev = create_acpi_ged(vms); +vms->generic_error_notifier.notify = virt_generic_error_req; +notifier_list_add(&acpi_generic_error_notifiers, + &vms->generic_error_notifier); } else { create_gpio_devices(vms, VIRT_GPIO, sysmem); } diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h index d62d8d9db5ae..1c682d43fdac 100644 --- a/include/hw/arm/virt.h +++ b/include/hw/arm/virt.h @@ -175,6 +175,7 @@ struct VirtMachineState { DeviceState *gic; DeviceState *acpi_dev; Notifier powerdown_notifier; +Notifier generic_error_notifier; PCIBus *bus; char *oem_id; char *oem_table_id; -- 2.46.0
[PATCH v9 10/12] scripts/ghes_inject: add a script to generate GHES error inject
Using the QMP GHESv2 API requires preparing a raw data array containing a CPER record. Add a helper script with subcommands to prepare such data. Currently, only ARM Processor error CPER record is supported. Signed-off-by: Mauro Carvalho Chehab --- MAINTAINERS| 3 + scripts/arm_processor_error.py | 377 ++ scripts/ghes_inject.py | 51 +++ scripts/qmp_helper.py | 702 + 4 files changed, 1133 insertions(+) create mode 100644 scripts/arm_processor_error.py create mode 100755 scripts/ghes_inject.py create mode 100644 scripts/qmp_helper.py diff --git a/MAINTAINERS b/MAINTAINERS index 1d8091818899..249ed2858198 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2083,6 +2083,9 @@ S: Maintained F: hw/arm/ghes_cper.c F: hw/acpi/ghes_cper_stub.c F: qapi/acpi-hest.json +F: scripts/ghes_inject.py +F: scripts/arm_processor_error.py +F: scripts/qmp_helper.py ppc4xx L: qemu-...@nongnu.org diff --git a/scripts/arm_processor_error.py b/scripts/arm_processor_error.py new file mode 100644 index ..62e0c5662232 --- /dev/null +++ b/scripts/arm_processor_error.py @@ -0,0 +1,377 @@ +#!/usr/bin/env python3 +# +# pylint: disable=C0301,C0114,R0903,R0912,R0913,R0914,R0915,W0511 +# SPDX-License-Identifier: GPL-2.0 +# +# Copyright (C) 2024 Mauro Carvalho Chehab + +# TODO: current implementation has dummy defaults. +# +# For a better implementation, a QMP addition/call is needed to +# retrieve some data for ARM Processor Error injection: +# +# - ARM registers: power_state, mpidr. + +import argparse +import re + +from qmp_helper import qmp, util, cper_guid + +class ArmProcessorEinj: +""" +Implements ARM Processor Error injection via GHES +""" + +DESC = """ +Generates an ARM processor error CPER, compatible with +UEFI 2.9A Errata. +""" + +ACPI_GHES_ARM_CPER_LENGTH = 40 +ACPI_GHES_ARM_CPER_PEI_LENGTH = 32 + +# Context types +CONTEXT_AARCH32_EL1 = 1 +CONTEXT_AARCH64_EL1 = 5 +CONTEXT_MISC_REG = 8 + +def __init__(self, subparsers): +"""Initialize the error injection class and add subparser""" + +# Valid choice values +self.arm_valid_bits = { +"mpidr":util.bit(0), +"affinity": util.bit(1), +"running": util.bit(2), +"vendor": util.bit(3), +} + +self.pei_flags = { +"first":util.bit(0), +"last": util.bit(1), +"propagated": util.bit(2), +"overflow": util.bit(3), +} + +self.pei_error_types = { +"cache":util.bit(1), +"tlb": util.bit(2), +"bus": util.bit(3), +"micro-arch": util.bit(4), +} + +self.pei_valid_bits = { +"multiple-error": util.bit(0), +"flags":util.bit(1), +"error-info": util.bit(2), +"virt-addr":util.bit(3), +"phy-addr": util.bit(4), +} + +self.data = bytearray() + +parser = subparsers.add_parser("arm", description=self.DESC) + +arm_valid_bits = ",".join(self.arm_valid_bits.keys()) +flags = ",".join(self.pei_flags.keys()) +error_types = ",".join(self.pei_error_types.keys()) +pei_valid_bits = ",".join(self.pei_valid_bits.keys()) + +# UEFI N.16 ARM Validation bits +g_arm = parser.add_argument_group("ARM processor") +g_arm.add_argument("--arm", "--arm-valid", + help=f"ARM valid bits: {arm_valid_bits}") +g_arm.add_argument("-a", "--affinity", "--level", "--affinity-level", + type=lambda x: int(x, 0), + help="Affinity level (when multiple levels apply)") +g_arm.add_argument("-l", "--mpidr", type=lambda x: int(x, 0), + help="Multiprocessor Affinity Register") +g_arm.add_argument("-i", "--midr", type=lambda x: int(x, 0), + help="Main ID Register") +g_arm.add_argument("-r", "--running", + action=argparse.BooleanOptionalAction, + default=None, + help="Indicates if the processor is running or not") +g_arm.add_argument("--psci", "--psci-state", + type=lambda x: int(x,
[PATCH v9 05/12] acpi/ghes: add a notifier to notify when error data is ready
Some error injection notify methods are async, like GPIO notify. Add a notifier to be used when the error record is ready to be sent to the guest OS. Signed-off-by: Mauro Carvalho Chehab --- hw/acpi/ghes.c | 5 + include/hw/acpi/ghes.h | 3 +++ 2 files changed, 8 insertions(+) diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index 10ed9c0614ff..2b7103a678a1 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -402,6 +402,9 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s, ags->present = true; } +NotifierList acpi_generic_error_notifiers = +NOTIFIER_LIST_INITIALIZER(error_device_notifiers); + void ghes_record_cper_errors(const void *cper, size_t len, uint16_t source_id, Error **errp) { @@ -492,6 +495,8 @@ void ghes_record_cper_errors(const void *cper, size_t len, /* Write the generic error data entry into guest memory */ cpu_physical_memory_write(cper_addr, cper, len); + +notifier_list_notify(&acpi_generic_error_notifiers, NULL); } int acpi_ghes_memory_errors(int source_id, uint64_t physical_address) diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h index be53b7c53c91..b1ec9795270f 100644 --- a/include/hw/acpi/ghes.h +++ b/include/hw/acpi/ghes.h @@ -24,6 +24,9 @@ #include "hw/acpi/bios-linker-loader.h" #include "qapi/error.h" +#include "qemu/notify.h" + +extern NotifierList acpi_generic_error_notifiers; /* * Values for Hardware Error Notification Type field -- 2.46.0
[PATCH v9 03/12] acpi/ghes: rename etc/hardware_error file macros
Now that we have also have a file to store HEST data location, which is part of GHES, better name the file where CPER records are stored. No functional changes. Signed-off-by: Mauro Carvalho Chehab --- hw/acpi/ghes.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index 965fb1b36587..3190eb954de4 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -28,8 +28,8 @@ #include "hw/nvram/fw_cfg.h" #include "qemu/uuid.h" -#define ACPI_GHES_ERRORS_FW_CFG_FILE"etc/hardware_errors" -#define ACPI_GHES_DATA_ADDR_FW_CFG_FILE "etc/hardware_errors_addr" +#define ACPI_HW_ERROR_FW_CFG_FILE "etc/hardware_errors" +#define ACPI_HW_ERROR_ADDR_FW_CFG_FILE "etc/hardware_errors_addr" #define ACPI_HEST_ADDR_FW_CFG_FILE "etc/acpi_table_hest_addr" /* The max size in bytes for one error block */ @@ -255,7 +255,7 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker, ACPI_GHES_MAX_RAW_DATA_LENGTH * num_sources); /* Tell guest firmware to place hardware_errors blob into RAM */ -bios_linker_loader_alloc(linker, ACPI_GHES_ERRORS_FW_CFG_FILE, +bios_linker_loader_alloc(linker, ACPI_HW_ERROR_FW_CFG_FILE, hardware_errors, sizeof(uint64_t), false); for (i = 0; i < num_sources; i++) { @@ -264,8 +264,8 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker, * corresponding "Generic Error Status Block" */ bios_linker_loader_add_pointer(linker, -ACPI_GHES_ERRORS_FW_CFG_FILE, sizeof(uint64_t) * i, -sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE, +ACPI_HW_ERROR_FW_CFG_FILE, sizeof(uint64_t) * i, +sizeof(uint64_t), ACPI_HW_ERROR_FW_CFG_FILE, error_status_block_offset + i * ACPI_GHES_MAX_RAW_DATA_LENGTH); } @@ -273,9 +273,9 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker, * tell firmware to write hardware_errors GPA into * hardware_errors_addr fw_cfg, once the former has been initialized. */ -bios_linker_loader_write_pointer(linker, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, 0, +bios_linker_loader_write_pointer(linker, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, 0, sizeof(uint64_t), - ACPI_GHES_ERRORS_FW_CFG_FILE, 0); + ACPI_HW_ERROR_FW_CFG_FILE, 0); } /* Build Generic Hardware Error Source version 2 (GHESv2) */ @@ -315,7 +315,7 @@ static void build_ghes_v2(GArray *table_data, bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE, address_offset + GAS_ADDR_OFFSET, sizeof(uint64_t), - ACPI_GHES_ERRORS_FW_CFG_FILE, + ACPI_HW_ERROR_FW_CFG_FILE, source_id * sizeof(uint64_t)); /* Notification Structure */ @@ -335,7 +335,7 @@ static void build_ghes_v2(GArray *table_data, bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE, address_offset + GAS_ADDR_OFFSET, sizeof(uint64_t), - ACPI_GHES_ERRORS_FW_CFG_FILE, + ACPI_HW_ERROR_FW_CFG_FILE, (num_sources + source_id) * sizeof(uint64_t)); @@ -389,11 +389,11 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s, GArray *hardware_error) { /* Create a read-only fw_cfg file for GHES */ -fw_cfg_add_file(s, ACPI_GHES_ERRORS_FW_CFG_FILE, hardware_error->data, +fw_cfg_add_file(s, ACPI_HW_ERROR_FW_CFG_FILE, hardware_error->data, hardware_error->len); /* Create a read-write fw_cfg file for Address */ -fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL, +fw_cfg_add_file_callback(s, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, NULL, NULL, NULL, &(ags->ghes_addr_le), sizeof(ags->ghes_addr_le), false); fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL, -- 2.46.0
[PATCH v9 00/12] Add ACPI CPER firmware first error injection on ARM emulation
inity Register (MPIDR): 0x8000 [9.361643] {1}[Hardware Error]: running state: 0x0 [9.362142] {1}[Hardware Error]: Power State Coordination Interface state: 0 [9.362682] {1}[Hardware Error]: Error info structure 0: [9.363030] {1}[Hardware Error]: num errors: 2 [9.363656] {1}[Hardware Error]:error_type: 0x02: cache error [9.364163] {1}[Hardware Error]:error_info: 0x0091000f [9.364834] {1}[Hardware Error]: transaction type: Data Access [9.365599] {1}[Hardware Error]: cache error, operation type: Data write [9.366441] {1}[Hardware Error]: cache level: 2 [9.367005] {1}[Hardware Error]: processor context not corrupted [9.367753] {1}[Hardware Error]:physical fault address: 0xdeadbeef [9.374267] Memory failure: 0xdeadb: recovery action for free buddy page: Recovered Such script currently supports arm processor error CPER, but can easily be extended to other GHES notification types. Mauro Carvalho Chehab (12): acpi/ghes: add a firmware file with HEST address acpi/ghes: rework the logic to handle HEST source ID acpi/ghes: rename etc/hardware_error file macros acpi/ghes: better name GHES memory error function acpi/ghes: add a notifier to notify when error data is ready acpi/generic_event_device: add an APEI error device arm/virt: Wire up a GED error device for ACPI / GHES qapi/acpi-hest: add an interface to do generic CPER error injection docs: acpi_hest_ghes: fix documentation for CPER size scripts/ghes_inject: add a script to generate GHES error inject target/arm: add an experimental mpidr arm cpu property object scripts/arm_processor_error.py: retrieve mpidr if not filled MAINTAINERS| 10 + docs/specs/acpi_hest_ghes.rst | 6 +- hw/acpi/Kconfig| 5 + hw/acpi/aml-build.c| 10 + hw/acpi/generic_event_device.c | 8 + hw/acpi/ghes-stub.c| 2 +- hw/acpi/ghes.c | 309 +++ hw/acpi/ghes_cper.c| 32 ++ hw/acpi/ghes_cper_stub.c | 19 + hw/acpi/meson.build| 2 + hw/arm/Kconfig | 5 + hw/arm/virt-acpi-build.c | 12 +- hw/arm/virt.c | 12 +- include/hw/acpi/acpi_dev_interface.h | 1 + include/hw/acpi/aml-build.h| 2 + include/hw/acpi/generic_event_device.h | 1 + include/hw/acpi/ghes.h | 28 +- include/hw/arm/virt.h | 10 + qapi/acpi-hest.json| 36 ++ qapi/meson.build | 1 + qapi/qapi-schema.json | 1 + scripts/arm_processor_error.py | 388 ++ scripts/ghes_inject.py | 51 ++ scripts/qmp_helper.py | 702 + target/arm/cpu.c | 1 + target/arm/cpu.h | 1 + target/arm/helper.c| 10 +- target/arm/kvm.c | 3 +- 28 files changed, 1539 insertions(+), 129 deletions(-) create mode 100644 hw/acpi/ghes_cper.c create mode 100644 hw/acpi/ghes_cper_stub.c create mode 100644 qapi/acpi-hest.json create mode 100644 scripts/arm_processor_error.py create mode 100755 scripts/ghes_inject.py create mode 100644 scripts/qmp_helper.py -- 2.46.0
[PATCH v9 08/12] qapi/acpi-hest: add an interface to do generic CPER error injection
Creates a QMP command to be used for generic ACPI APEI hardware error injection (HEST) via GHESv2, and add support for it for ARM guests. Error injection uses ACPI_HEST_SRC_ID_QMP source ID to be platform independent. This is mapped at arch virt bindings, depending on the types supported by QEMU and by the BIOS. So, on ARM, this is supported via ACPI_GHES_NOTIFY_GPIO notification type. This patch is co-authored: - original ghes logic to inject a simple ARM record by Shiju Jose; - generic logic to handle block addresses by Jonathan Cameron; - generic GHESv2 error inject by Mauro Carvalho Chehab; Co-authored-by: Jonathan Cameron Co-authored-by: Shiju Jose Co-authored-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Cameron Signed-off-by: Shiju Jose Signed-off-by: Mauro Carvalho Chehab --- MAINTAINERS | 7 +++ hw/acpi/Kconfig | 5 + hw/acpi/ghes_cper.c | 32 hw/acpi/ghes_cper_stub.c | 19 +++ hw/acpi/meson.build | 2 ++ hw/arm/Kconfig | 5 + hw/arm/virt-acpi-build.c | 1 + include/hw/acpi/ghes.h | 4 include/hw/arm/virt.h| 2 ++ qapi/acpi-hest.json | 36 qapi/meson.build | 1 + qapi/qapi-schema.json| 1 + 12 files changed, 115 insertions(+) create mode 100644 hw/acpi/ghes_cper.c create mode 100644 hw/acpi/ghes_cper_stub.c create mode 100644 qapi/acpi-hest.json diff --git a/MAINTAINERS b/MAINTAINERS index 3584d6a6c6da..1d8091818899 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2077,6 +2077,13 @@ F: hw/acpi/ghes.c F: include/hw/acpi/ghes.h F: docs/specs/acpi_hest_ghes.rst +ACPI/HEST/GHES/ARM processor CPER +R: Mauro Carvalho Chehab +S: Maintained +F: hw/arm/ghes_cper.c +F: hw/acpi/ghes_cper_stub.c +F: qapi/acpi-hest.json + ppc4xx L: qemu-...@nongnu.org S: Orphan diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig index e07d3204eb36..73ffbb82c150 100644 --- a/hw/acpi/Kconfig +++ b/hw/acpi/Kconfig @@ -51,6 +51,11 @@ config ACPI_APEI bool depends on ACPI +config GHES_CPER +bool +depends on ACPI_APEI +default y + config ACPI_PCI bool depends on ACPI && PCI diff --git a/hw/acpi/ghes_cper.c b/hw/acpi/ghes_cper.c new file mode 100644 index ..a10a5e7ab29b --- /dev/null +++ b/hw/acpi/ghes_cper.c @@ -0,0 +1,32 @@ +/* + * CPER payload parser for error injection + * + * Copyright(C) 2024 Huawei LTD. + * + * This code is licensed under the GPL version 2 or later. See the + * COPYING file in the top-level directory. + * + */ + +#include "qemu/osdep.h" + +#include "qemu/base64.h" +#include "qemu/error-report.h" +#include "qemu/uuid.h" +#include "qapi/qapi-commands-acpi-hest.h" +#include "hw/acpi/ghes.h" + +void qmp_ghes_cper(const char *qmp_cper, Error **errp) +{ + +uint8_t *cper; +size_t len; + +cper = qbase64_decode(qmp_cper, -1, &len, errp); +if (!cper) { +error_setg(errp, "missing GHES CPER payload"); +return; +} + +ghes_record_cper_errors(cper, len, ACPI_HEST_SRC_ID_QMP, errp); +} diff --git a/hw/acpi/ghes_cper_stub.c b/hw/acpi/ghes_cper_stub.c new file mode 100644 index ..36138c462ac9 --- /dev/null +++ b/hw/acpi/ghes_cper_stub.c @@ -0,0 +1,19 @@ +/* + * Stub interface for CPER payload parser for error injection + * + * Copyright(C) 2024 Huawei LTD. + * + * This code is licensed under the GPL version 2 or later. See the + * COPYING file in the top-level directory. + * + */ + +#include "qemu/osdep.h" +#include "qapi/error.h" +#include "qapi/qapi-commands-acpi-hest.h" +#include "hw/acpi/ghes.h" + +void qmp_ghes_cper(const char *cper, Error **errp) +{ +error_setg(errp, "GHES QMP error inject is not compiled in"); +} diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build index fa5c07db9068..6cbf430eb66d 100644 --- a/hw/acpi/meson.build +++ b/hw/acpi/meson.build @@ -34,4 +34,6 @@ endif system_ss.add(when: 'CONFIG_ACPI', if_false: files('acpi-stub.c', 'aml-build-stub.c', 'ghes-stub.c', 'acpi_interface.c')) system_ss.add(when: 'CONFIG_ACPI_PCI_BRIDGE', if_false: files('pci-bridge-stub.c')) system_ss.add_all(when: 'CONFIG_ACPI', if_true: acpi_ss) +system_ss.add(when: 'CONFIG_GHES_CPER', if_true: files('ghes_cper.c')) +system_ss.add(when: 'CONFIG_GHES_CPER', if_false: files('ghes_cper_stub.c')) system_ss.add(files('acpi-qmp-cmds.c')) diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig index 1ad60da7aa2d..bed6ba27d715 100644 --- a/hw/arm/Kconfig +++ b/hw/arm/Kconfig @@ -712,3 +712,8 @@ config ARMSSE select UNIMP select SSE_COUNTER select SSE_TIMER + +config GHES_CPER +bool +depends on ARM +default y if AARCH64 diff --git a/hw/arm/virt
[PATCH v9 09/12] docs: acpi_hest_ghes: fix documentation for CPER size
While the spec defines a CPER size of 4KiB for each record, currently it is set to 1KiB. Fix the documentation and add a pointer to the macro name there, as this may help to keep it updated. Signed-off-by: Mauro Carvalho Chehab Acked-by: Igor Mammedov --- docs/specs/acpi_hest_ghes.rst | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/specs/acpi_hest_ghes.rst b/docs/specs/acpi_hest_ghes.rst index 68f1fbe0a4af..c3e9f8d9a702 100644 --- a/docs/specs/acpi_hest_ghes.rst +++ b/docs/specs/acpi_hest_ghes.rst @@ -67,8 +67,10 @@ Design Details (3) The address registers table contains N Error Block Address entries and N Read Ack Register entries. The size for each entry is 8-byte. The Error Status Data Block table contains N Error Status Data Block -entries. The size for each entry is 4096(0x1000) bytes. The total size -for the "etc/hardware_errors" fw_cfg blob is (N * 8 * 2 + N * 4096) bytes. +entries. The size for each entry is defined at the source code as +ACPI_GHES_MAX_RAW_DATA_LENGTH (currently 1024 bytes). The total size +for the "etc/hardware_errors" fw_cfg blob is +(N * 8 * 2 + N * ACPI_GHES_MAX_RAW_DATA_LENGTH) bytes. N is the number of the kinds of hardware error sources. (4) QEMU generates the ACPI linker/loader script for the firmware. The -- 2.46.0
[PATCH v9 02/12] acpi/ghes: rework the logic to handle HEST source ID
The current logic is based on a lot of duct tape, with offsets calculated based on one define with the number of source IDs and an enum. Rewrite the logic in a way that it would be more resilient of code changes, by moving the source ID count to an enum and make the offset calculus more explicit. Such change was inspired on a patch from Jonathan Cameron splitting the logic to get the CPER address on a separate function, as this will be needed to support generic error injection. Signed-off-by: Mauro Carvalho Chehab --- Changes from v8: - Non-rename/cleanup changes merged altogether; - source ID is now more generic, defined per guest target. That should make easier to add support for 86. Signed-off-by: Mauro Carvalho Chehab --- hw/acpi/ghes.c | 275 --- hw/arm/virt-acpi-build.c | 10 +- include/hw/acpi/ghes.h | 18 +-- include/hw/arm/virt.h| 7 + target/arm/kvm.c | 3 +- 5 files changed, 198 insertions(+), 115 deletions(-) diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index 529c14e3289f..965fb1b36587 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -35,9 +35,6 @@ /* The max size in bytes for one error block */ #define ACPI_GHES_MAX_RAW_DATA_LENGTH (1 * KiB) -/* Now only support ARMv8 SEA notification type error source */ -#define ACPI_GHES_ERROR_SOURCE_COUNT1 - /* Generic Hardware Error Source version 2 */ #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2 10 @@ -64,6 +61,19 @@ */ #define ACPI_GHES_GESB_SIZE 20 +/* + * Offsets with regards to the start of the HEST table stored at + * ags->hest_addr_le, according with the memory layout map at + * docs/specs/acpi_hest_ghes.rst. + */ + +/* ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2 */ +#define HEST_GHES_V2_TABLE_SIZE 92 +#define GHES_ACK_OFFSET (64 + GAS_ADDR_OFFSET) + +/* ACPI 6.2: 18.3.2.7: Generic Hardware Error Source */ +#define GHES_ERR_ST_ADDR_OFFSET (20 + GAS_ADDR_OFFSET) + /* * Values for error_severity field */ @@ -185,51 +195,30 @@ static void acpi_ghes_build_append_mem_cper(GArray *table, build_append_int_noprefix(table, 0, 7); } -static int acpi_ghes_record_mem_error(uint64_t error_block_address, - uint64_t error_physical_addr) +static void +ghes_gen_err_data_uncorrectable_recoverable(GArray *block, +const uint8_t *section_type, +int data_length) { -GArray *block; - -/* Memory Error Section Type */ -const uint8_t uefi_cper_mem_sec[] = - UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \ - 0xED, 0x7C, 0x83, 0xB1); - /* invalid fru id: ACPI 4.0: 17.3.2.6.1 Generic Error Data, * Table 17-13 Generic Error Data Entry */ QemuUUID fru_id = {}; -uint32_t data_length; -block = g_array_new(false, true /* clear */, 1); - -/* This is the length if adding a new generic error data entry*/ -data_length = ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH; /* - * It should not run out of the preallocated memory if adding a new generic - * error data entry + * Calculate the size with this block. No need to check for + * too big CPER, as CPER size is checked at ghes_record_cper_errors() */ -assert((data_length + ACPI_GHES_GESB_SIZE) <= -ACPI_GHES_MAX_RAW_DATA_LENGTH); +data_length += ACPI_GHES_GESB_SIZE; /* Build the new generic error status block header */ acpi_ghes_generic_error_status(block, ACPI_GEBS_UNCORRECTABLE, 0, 0, data_length, ACPI_CPER_SEV_RECOVERABLE); /* Build this new generic error data entry header */ -acpi_ghes_generic_error_data(block, uefi_cper_mem_sec, +acpi_ghes_generic_error_data(block, section_type, ACPI_CPER_SEV_RECOVERABLE, 0, 0, ACPI_GHES_MEM_CPER_LENGTH, fru_id, 0); - -/* Build the memory section CPER for above new generic error data entry */ -acpi_ghes_build_append_mem_cper(block, error_physical_addr); - -/* Write the generic error data entry into guest memory */ -cpu_physical_memory_write(error_block_address, block->data, block->len); - -g_array_free(block, true); - -return 0; } /* @@ -237,17 +226,18 @@ static int acpi_ghes_record_mem_error(uint64_t error_block_address, * Initialize "etc/hardware_errors" and "etc/hardware_errors_addr" fw_cfg blobs. * See docs/specs/acpi_hest_ghes.rst for blobs format. */ -void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker) +static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker, + int num_sources) { int i, error_status_block_offset; /* Build error_block_address */ -for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) { +for (i = 0; i < num_sources; i++) { build_
[PATCH v9 04/12] acpi/ghes: better name GHES memory error function
The current function used to generate GHES data is specific for memory errors. Give a better name for it, as we now have a generic function as well. Signed-off-by: Mauro Carvalho Chehab --- hw/acpi/ghes-stub.c| 2 +- hw/acpi/ghes.c | 2 +- include/hw/acpi/ghes.h | 4 ++-- target/arm/kvm.c | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c index c315de1802d6..dd41b3fd91df 100644 --- a/hw/acpi/ghes-stub.c +++ b/hw/acpi/ghes-stub.c @@ -11,7 +11,7 @@ #include "qemu/osdep.h" #include "hw/acpi/ghes.h" -int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address) +int acpi_ghes_memory_errors(uint8_t source_id, uint64_t physical_address) { return -1; } diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index 3190eb954de4..10ed9c0614ff 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -494,7 +494,7 @@ void ghes_record_cper_errors(const void *cper, size_t len, cpu_physical_memory_write(cper_addr, cper, len); } -int acpi_ghes_record_errors(int source_id, uint64_t physical_address) +int acpi_ghes_memory_errors(int source_id, uint64_t physical_address) { /* Memory Error Section Type */ const uint8_t guid[] = diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h index 4b5af86ec077..be53b7c53c91 100644 --- a/include/hw/acpi/ghes.h +++ b/include/hw/acpi/ghes.h @@ -70,7 +70,7 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors, const char *oem_id, const char *oem_table_id); void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s, GArray *hardware_errors); -int acpi_ghes_record_errors(int source_id, +int acpi_ghes_memory_errors(int source_id, uint64_t error_physical_addr); void ghes_record_cper_errors(const void *cper, size_t len, uint16_t source_id, Error **errp); @@ -79,7 +79,7 @@ void ghes_record_cper_errors(const void *cper, size_t len, * acpi_ghes_present: Report whether ACPI GHES table is present * * Returns: true if the system has an ACPI GHES table and it is - * safe to call acpi_ghes_record_errors() to record a memory error. + * safe to call acpi_ghes_memory_errors() to record a memory error. */ bool acpi_ghes_present(void); #endif diff --git a/target/arm/kvm.c b/target/arm/kvm.c index 8c4c8263b85a..8e63e9a59a5e 100644 --- a/target/arm/kvm.c +++ b/target/arm/kvm.c @@ -2373,7 +2373,7 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr) */ if (code == BUS_MCEERR_AR) { kvm_cpu_synchronize_state(c); -if (!acpi_ghes_record_errors(ARM_ACPI_HEST_SRC_ID_SEA, +if (!acpi_ghes_memory_errors(ARM_ACPI_HEST_SRC_ID_SEA, paddr)) { kvm_inject_arm_sea(c); } else { -- 2.46.0
Re: [PATCH v8 13/13] acpi/ghes: check if the BIOS pointers for HEST are correct
Em Sat, 24 Aug 2024 02:15:10 +0200 Mauro Carvalho Chehab escreveu: > Ok, we could still do something like this pseudo-code to get the > error source offset: > > #define ACPI_HEST_TYPE_GHESV2 11 > > err_struct_offset = 0; > for (i = 0; i < source_id_count; i++) { > /* NOTE: Other types may have different sizes */ > assert(ghes[i].type == ACPI_HEST_TYPE_GHESV2); > if (ghes[i].source_id == source_id) > break; > err_struct_offset += HEST_GHES_V2_TABLE_SIZE; > } > assert (i < source_id_count); This is what I ended implementing on v9. Regards, Mauro
[PATCH v9 01/12] acpi/ghes: add a firmware file with HEST address
Store HEST table address at GPA, placing its content at hest_addr_le variable. Signed-off-by: Mauro Carvalho Chehab --- Change from v8: - hest_addr_lr is now pointing to the error source size and data. Signed-off-by: Mauro Carvalho Chehab --- hw/acpi/ghes.c | 15 +++ include/hw/acpi/ghes.h | 1 + 2 files changed, 16 insertions(+) diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index e9511d9b8f71..529c14e3289f 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -30,6 +30,7 @@ #define ACPI_GHES_ERRORS_FW_CFG_FILE"etc/hardware_errors" #define ACPI_GHES_DATA_ADDR_FW_CFG_FILE "etc/hardware_errors_addr" +#define ACPI_HEST_ADDR_FW_CFG_FILE "etc/acpi_table_hest_addr" /* The max size in bytes for one error block */ #define ACPI_GHES_MAX_RAW_DATA_LENGTH (1 * KiB) @@ -367,11 +368,22 @@ void acpi_build_hest(GArray *table_data, BIOSLinker *linker, acpi_table_begin(&table, table_data); +int hest_offset = table_data->len; + /* Error Source Count */ build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4); build_ghes_v2(table_data, ACPI_HEST_SRC_ID_SEA, linker); acpi_table_end(linker, &table); + +/* + * tell firmware to write into GPA the address of HEST via fw_cfg, + * once initialized. + */ +bios_linker_loader_write_pointer(linker, + ACPI_HEST_ADDR_FW_CFG_FILE, 0, + sizeof(uint64_t), + ACPI_BUILD_TABLE_FILE, hest_offset); } void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s, @@ -385,6 +397,9 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s, fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL, NULL, &(ags->ghes_addr_le), sizeof(ags->ghes_addr_le), false); +fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL, +NULL, &(ags->hest_addr_le), sizeof(ags->hest_addr_le), false); + ags->present = true; } diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h index 674f6958e905..28b956acb19a 100644 --- a/include/hw/acpi/ghes.h +++ b/include/hw/acpi/ghes.h @@ -63,6 +63,7 @@ enum { }; typedef struct AcpiGhesState { +uint64_t hest_addr_le; uint64_t ghes_addr_le; bool present; /* True if GHES is present at all on this board */ } AcpiGhesState; -- 2.46.0
[PATCH v9 12/12] scripts/arm_processor_error.py: retrieve mpidr if not filled
Add support to retrieve mpidr value via qom-get. Signed-off-by: Mauro Carvalho Chehab --- scripts/arm_processor_error.py | 27 +++ 1 file changed, 19 insertions(+), 8 deletions(-) diff --git a/scripts/arm_processor_error.py b/scripts/arm_processor_error.py index 62e0c5662232..0a16d4f0d8b1 100644 --- a/scripts/arm_processor_error.py +++ b/scripts/arm_processor_error.py @@ -5,12 +5,10 @@ # # Copyright (C) 2024 Mauro Carvalho Chehab -# TODO: current implementation has dummy defaults. -# -# For a better implementation, a QMP addition/call is needed to -# retrieve some data for ARM Processor Error injection: -# -# - ARM registers: power_state, mpidr. +# Note: currently it lacks a method to fill the ARM Processor Error CPER +# psci field from emulation. On a real hardware, this is filled only +# when a CPU is not running. Implementing support for it to simulate a +# real hardware is not trivial. import argparse import re @@ -174,11 +172,24 @@ def send_cper(self, args): else: cper["running-state"] = 0 +if args.mpidr: +cper["mpidr-el1"] = arg["mpidr"] +elif cpus: +cmd_arg = { +'path': cpus[0], +'property': "x-mpidr" +} +ret = qmp_cmd.send_cmd("qom-get", cmd_arg, may_open=True) +if isinstance(ret, int): +cper["mpidr-el1"] = ret +else: +cper["mpidr-el1"] = 0 + if arm_valid_init: if args.affinity: cper["valid"] |= self.arm_valid_bits["affinity"] -if args.mpidr: +if "mpidr-el1" in cper: cper["valid"] |= self.arm_valid_bits["mpidr"] if "running-state" in cper: @@ -362,7 +373,7 @@ def send_cper(self, args): if isinstance(ret, int): arg["midr-el1"] = ret -util.data_add(data, arg.get("mpidr-el1", 0), 8) +util.data_add(data, cper["mpidr-el1"], 8) util.data_add(data, arg.get("midr-el1", 0), 8) util.data_add(data, cper["running-state"], 4) util.data_add(data, arg.get("psci-state", 0), 4) -- 2.46.0
[PATCH v9 06/12] acpi/generic_event_device: add an APEI error device
Adds a generic error device to handle generic hardware error events as specified at ACPI 6.5 specification at 18.3.2.7.2: https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#event-notification-for-generic-error-sources using HID PNP0C33. The PNP0C33 device is used to report hardware errors to the guest via ACPI APEI Generic Hardware Error Source (GHES). Co-authored-by: Mauro Carvalho Chehab Co-authored-by: Jonathan Cameron Signed-off-by: Jonathan Cameron Signed-off-by: Mauro Carvalho Chehab Reviewed-by: Igor Mammedov --- hw/acpi/aml-build.c| 10 ++ hw/acpi/generic_event_device.c | 8 include/hw/acpi/acpi_dev_interface.h | 1 + include/hw/acpi/aml-build.h| 2 ++ include/hw/acpi/generic_event_device.h | 1 + 5 files changed, 22 insertions(+) diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c index 6d4517cfbe3d..222a933a8760 100644 --- a/hw/acpi/aml-build.c +++ b/hw/acpi/aml-build.c @@ -2520,3 +2520,13 @@ Aml *aml_i2c_serial_bus_device(uint16_t address, const char *resource_source) return var; } + +/* ACPI 5.0b: 18.3.2.6.2 Event Notification For Generic Error Sources */ +Aml *aml_error_device(void) +{ +Aml *dev = aml_device(ACPI_APEI_ERROR_DEVICE); +aml_append(dev, aml_name_decl("_HID", aml_string("PNP0C33"))); +aml_append(dev, aml_name_decl("_UID", aml_int(0))); + +return dev; +} diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c index 15b4c3ebbf24..b4c83a089a02 100644 --- a/hw/acpi/generic_event_device.c +++ b/hw/acpi/generic_event_device.c @@ -26,6 +26,7 @@ static const uint32_t ged_supported_events[] = { ACPI_GED_PWR_DOWN_EVT, ACPI_GED_NVDIMM_HOTPLUG_EVT, ACPI_GED_CPU_HOTPLUG_EVT, +ACPI_GED_ERROR_EVT, }; /* @@ -116,6 +117,11 @@ void build_ged_aml(Aml *table, const char *name, HotplugHandler *hotplug_dev, aml_notify(aml_name(ACPI_POWER_BUTTON_DEVICE), aml_int(0x80))); break; +case ACPI_GED_ERROR_EVT: +aml_append(if_ctx, + aml_notify(aml_name(ACPI_APEI_ERROR_DEVICE), + aml_int(0x80))); +break; case ACPI_GED_NVDIMM_HOTPLUG_EVT: aml_append(if_ctx, aml_notify(aml_name("\\_SB.NVDR"), @@ -295,6 +301,8 @@ static void acpi_ged_send_event(AcpiDeviceIf *adev, AcpiEventStatusBits ev) sel = ACPI_GED_MEM_HOTPLUG_EVT; } else if (ev & ACPI_POWER_DOWN_STATUS) { sel = ACPI_GED_PWR_DOWN_EVT; +} else if (ev & ACPI_GENERIC_ERROR) { +sel = ACPI_GED_ERROR_EVT; } else if (ev & ACPI_NVDIMM_HOTPLUG_STATUS) { sel = ACPI_GED_NVDIMM_HOTPLUG_EVT; } else if (ev & ACPI_CPU_HOTPLUG_STATUS) { diff --git a/include/hw/acpi/acpi_dev_interface.h b/include/hw/acpi/acpi_dev_interface.h index 68d9d15f50aa..8294f8f0ccca 100644 --- a/include/hw/acpi/acpi_dev_interface.h +++ b/include/hw/acpi/acpi_dev_interface.h @@ -13,6 +13,7 @@ typedef enum { ACPI_NVDIMM_HOTPLUG_STATUS = 16, ACPI_VMGENID_CHANGE_STATUS = 32, ACPI_POWER_DOWN_STATUS = 64, +ACPI_GENERIC_ERROR = 128, } AcpiEventStatusBits; #define TYPE_ACPI_DEVICE_IF "acpi-device-interface" diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h index a3784155cb33..44d1a6af0c69 100644 --- a/include/hw/acpi/aml-build.h +++ b/include/hw/acpi/aml-build.h @@ -252,6 +252,7 @@ struct CrsRangeSet { /* Consumer/Producer */ #define AML_SERIAL_BUS_FLAG_CONSUME_ONLY(1 << 1) +#define ACPI_APEI_ERROR_DEVICE "GEDD" /** * init_aml_allocator: * @@ -382,6 +383,7 @@ Aml *aml_dma(AmlDmaType typ, AmlDmaBusMaster bm, AmlTransferSize sz, uint8_t channel); Aml *aml_sleep(uint64_t msec); Aml *aml_i2c_serial_bus_device(uint16_t address, const char *resource_source); +Aml *aml_error_device(void); /* Block AML object primitives */ Aml *aml_scope(const char *name_format, ...) G_GNUC_PRINTF(1, 2); diff --git a/include/hw/acpi/generic_event_device.h b/include/hw/acpi/generic_event_device.h index 40af3550b56d..9ace8fe70328 100644 --- a/include/hw/acpi/generic_event_device.h +++ b/include/hw/acpi/generic_event_device.h @@ -98,6 +98,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(AcpiGedState, ACPI_GED) #define ACPI_GED_PWR_DOWN_EVT 0x2 #define ACPI_GED_NVDIMM_HOTPLUG_EVT 0x4 #define ACPI_GED_CPU_HOTPLUG_EVT0x8 +#define ACPI_GED_ERROR_EVT 0x10 typedef struct GEDState { MemoryRegion evt; -- 2.46.0
Re: [PATCH v9 11/12] target/arm: add an experimental mpidr arm cpu property object
Em Sun, 25 Aug 2024 12:34:14 +0100 Peter Maydell escreveu: > On Sun, 25 Aug 2024 at 04:46, Mauro Carvalho Chehab > wrote: > > > > Accurately injecting an ARM Processor error ACPI/APEI GHES > > error record requires the value of the ARM Multiprocessor > > Affinity Register (mpidr). > > > > While ARM implements it, this is currently not visible. > > > > Add a field at CPU storing it, and place it at arm_cpu_properties > > as experimental, thus allowing it to be queried via QMP using > > qom-get function. > > > static Property arm_cpu_properties[] = { > > DEFINE_PROP_UINT64("midr", ARMCPU, midr, 0), > > +DEFINE_PROP_UINT64("x-mpidr", ARMCPU, mpidr, 0), > > DEFINE_PROP_UINT64("mp-affinity", ARMCPU, > > mp_affinity, ARM64_AFFINITY_INVALID), > > DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID), > > > > Why do we need this? The ACPI HEST tables, in particular when using GHESv2 provide several kinds of errors. Among them, we have ARM Processor Error, as defined at UEFI 2.10 spec (and earlier versions), the Common Platform Error Record (CPER) is defined as: https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html?highlight=ghes#arm-processor-error-section There are two fields that are part of the CPER record. One of them is mandatory (MIDR); the other one is optional, but needed to decode another field. So, basically those errors need them. > Why is it experimental? This was a suggestion from Igor. As for now the QAPI for external error injection is experimental, It makes sense to me to keep it experimental as well. > The later patch > seems to use it via QMP, which I'm not super enthusiastic > about -- the preexisting mpidr and mp-affinity properties are > there for code that is creating CPU objects to configure > the CPU object, not as a query interface for QOM. I saw that. Basically the decoding by OS guest depends on MPIDR, as explained at the description of Error affinity level field: "For errors that can be attributed to a specific affinity level, this field defines the affinity level at which the error was produced, detected, and/or consumed. This is a value between 0 and 3. All other values (4-255) are reserved For example, a vendor may choose to define affinity levels as follows: Level 0: errors that can be precisely attributed to a specific CPU (e.g. due to a synchronous external abort) Level 1: Cache parity and/or ECC errors detected at cache of affinity level 1 (e.g. only attributed to higher level cache due to prefetching and/or error propagation) NOTE: Detailed meanings and groupings of affinity level are chip and/or platform specific. The affinity level described here must be consistent with the platform definitions used MPIDR. For cache/TLB errors, the cache/TLB level is provided by the cache/TLB error structure, which may differ from affinity level." Regards, Mauro
Re: [PATCH v9 11/12] target/arm: add an experimental mpidr arm cpu property object
Em Fri, 30 Aug 2024 17:27:27 +0100 Peter Maydell escreveu: > On Mon, 26 Aug 2024 at 04:12, Mauro Carvalho Chehab > wrote: > > > > Em Sun, 25 Aug 2024 12:34:14 +0100 > > Peter Maydell escreveu: > > > > > On Sun, 25 Aug 2024 at 04:46, Mauro Carvalho Chehab > > > wrote: > > > > > > > > Accurately injecting an ARM Processor error ACPI/APEI GHES > > > > error record requires the value of the ARM Multiprocessor > > > > Affinity Register (mpidr). > > > > > > > > While ARM implements it, this is currently not visible. > > > > > > > > Add a field at CPU storing it, and place it at arm_cpu_properties > > > > as experimental, thus allowing it to be queried via QMP using > > > > qom-get function. > > > > > > > static Property arm_cpu_properties[] = { > > > > DEFINE_PROP_UINT64("midr", ARMCPU, midr, 0), > > > > +DEFINE_PROP_UINT64("x-mpidr", ARMCPU, mpidr, 0), > > > > DEFINE_PROP_UINT64("mp-affinity", ARMCPU, > > > > mp_affinity, ARM64_AFFINITY_INVALID), > > > > DEFINE_PROP_INT32("node-id", ARMCPU, node_id, > > > > CPU_UNSET_NUMA_NODE_ID), > > > > > > Why do we need this? > > > > The ACPI HEST tables, in particular when using GHESv2 provide > > several kinds of errors. Among them, we have ARM Processor Error, > > as defined at UEFI 2.10 spec (and earlier versions), the Common > > Platform Error Record (CPER) is defined as: > > > > > > https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html?highlight=ghes#arm-processor-error-section > > > > There are two fields that are part of the CPER record. One of them is > > mandatory (MIDR); the other one is optional, but needed to decode another > > field. > > > > So, basically those errors need them. > > OK, but why do scripts outside of QEMU need the information, > as opposed to telling QEMU "hey, generate an error" and > QEMU knowing the format to use? Do we have any other > QMP APIs where something external provides raw ACPI > data like this? This was discussed during the review of this patch series. See, the ACPI Platform Error Interfaces (APEI) code currently in QEMU implements limited support for ACPI HEST - Hardware Error Source Table [1]. [1] https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#acpi-error-source HEST consists of, currently, 9 error types (plus 3 obsoleted ones). Among them, there is support for generic errors via GHES and GHESv2 types. While not officially obsoleted, GHES is superseded by GHESv2. GHESv2 (and GHES) has a section type field to identify which error type it is [2]. Currently, there are +10 defined UUIDs for the section type. [2] https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#section-descriptor The current code on ghes.c implements GHESv2 support for a single type (memory error), received from the host OS via SIGBUS. Testing such code and injecting such error is not easy, as the host OS needs to send a SIGBUS to the guest, this reflecting an error at the main OS. Such code also has several limitations. - At the first three versions of this patch set, the code was just doing like what you said: it was adding an error injection for a HEST GHESv2 ARM Processor Error. So the error record (CPER) were produced in QEMU using some optional parameters passed via QMP to change fields when needed. With such approach, QEMU could use directly the value from MIDR and MPIDR. The main disadvantage is that, to make full support of HEST, a lot of code will be needed to add support for every GHESv2 type and for every GHESv2 section type. So, the feedback we had were to re-implement it into a generic way. The generic CPER error inject approach (since v4 of this series), has soma advantages: - it is easy to do fuzz testing, as the entire CPER is built via a python script; - no need to modify QEMU to support other GHESv2 types of record and to support other types of processors; - GHESv2 fields can also be dynamically generated; - It shouldn't be hard to change the code to support other types of HEST table (currently, only GHESv2 is supported). The disadvantage is that queries are needed to pick configuration and register values from the current emulation to do error injection. For ARM Processor Error, it means that MPIDR and MIDR, are needed. Other processors and other error types will also require to query other data from QEMU, either using already-existing QMP code or by adding new ones. Yet, the amount of code for such queries seem to be smaller than the amount of code to be added for every single GHESv2/HEST type. - Worth saying that QEMU may still require internal HEST/GHES errors to be able to reflect at the guests hardware problems detected at the host OS. So, for instance, if a host OS memory is poisoned due to hardware errors, QEMU and guests need to know, in order to kill processes affected by a bad memory. Regards, Mauro
Re: [PATCH v8 06/13] acpi/ghes: add support for generic error injection via QAPI
Em Thu, 12 Sep 2024 14:42:33 +0200 Igor Mammedov escreveu: > On Wed, 11 Sep 2024 16:34:36 +0100 > Jonathan Cameron wrote: > > > On Wed, 11 Sep 2024 15:21:32 +0200 > > Igor Mammedov wrote: > > > > > On Sun, 25 Aug 2024 05:29:23 +0200 > > > Mauro Carvalho Chehab wrote: > > > > > > > Em Mon, 19 Aug 2024 14:51:36 +0200 > > > > Igor Mammedov escreveu: > > > > > > > > > > +read_ack = 1; > > > > > > +cpu_physical_memory_write(read_ack_start_addr, > > > > > > + &read_ack, (uint64_t)); > > > > > we don't do this for SEV so, why are you setting it to 1 here? The diffstat doesn't really help here. The full code is: /* zero means OSPM does not acknowledge the error */ if (!read_ack) { error_setg(errp, "Last CPER record was not acknowledged yet"); read_ack = 1; cpu_physical_memory_write(read_ack_start_addr, &read_ack, sizeof(read_ack)); return; } > > > what you are doing here by setting read_ack = 1, > > > is making ack on behalf of OSPM when OSPM haven't handled existing error > > > yet. > > > > > > Essentially making HW/FW do the job of OSPM. That looks wrong to me. > > > From HW/FW side read_ack register should be thought as read-only. > > > > It's not read-only because HW/FW has to clear it so that HW/FW can detect > > when the OSPM next writes it. > > By readonly, I've meant that hw shall not do above mentioned write > (bad phrasing on my side). The above code is actually an error handling condition: if for some reason errors are triggered too fast, there's a bug on QEMU or there is a bug at the OSPM, an error message is raised and the logic resets the record to a sane state. So, on a next error, OSPM will get it. As described at https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html?highlight=asynchronous#generic-hardware-error-source: "Some platforms may describe multiple Generic Hardware Error Source structures with different notification types, as defined in Table 18.10. For example, a platform may describe one error source for the handling of synchronous errors (e.g. MCE or SEA), and a second source for handling asynchronous errors (e.g. SCI or External Interrupt)." Basically, the error logic there seems to fit for the asynchronous case, detecting if another error happened before OSPM handles the first one. IMO, there are a couple of alternatives to handle such case: 1. Keep the code as-is: if this ever happens, an error message will be issued. If SEA/MCE gets implemented synchronously on HW/FW/OSPM, the above code will never be called; 2. Change the logic to do that only for asynchronous sources (currently, only if source ID is QMP); 3. Add a special QMP message to reset the notification ack. Probably would use Notification type as an input parameter; 4. Have a much more complex code to implement asynchronous notifications, with a queue to receive HEST errors and a separate thread to deliver errors to OSPM asynchronously. If we go this way, QMP would be returning the number of error messages queued, allowing error injection code to know if OSPM has troubles delivering errors; 5. Just return an error code without doing any resets. To me, this is the worse scenario. I don't like (5), as if something bad happens, there's nothing to be done. For QMP error injection (4) seems is overkill. It may be needed in the future if we end implementing a logic where host OS informs guest about hardware problems, and such errors use asynchronous notifications. I would also avoid implementing (3) at least for now, as reporting such error via QMP seems enough for the QMP usecase. So, if ok for you, I'll change the code to (2). > > Agreed this write to 1 looks wrong, but the one a few lines further down > > (to zero > > it) is correct. > > yep, hw should clear register. > It would be better to so on OSPM ACK, but alas we can't intercept that, > so the next option would be to do that at the time when we add a new error > block > > > > > My bug a long time back I think. > > > > Jonathan > > > > > > > > > > > > > IMO, this is needed, independently of the notification mechanism. > > > > > > > > Regards, > > > > Mauro > > > > > > > > > > > > > Thanks, Mauro
Re: [PATCH v9 01/12] acpi/ghes: add a firmware file with HEST address
Em Wed, 11 Sep 2024 15:51:08 +0200 Igor Mammedov escreveu: > On Sun, 25 Aug 2024 05:45:56 +0200 > Mauro Carvalho Chehab wrote: > > > Store HEST table address at GPA, placing its content at > > hest_addr_le variable. > > > > Signed-off-by: Mauro Carvalho Chehab > > This looks good to me. > > in addition to this, it needs a patch on top to make sure > that we migrate hest_addr_le. > See a08a64627b6b 'ACPI: Record the Generic Error Status Block address' > and fixes on top of that for an example. Hmm... If I understood such change well, vmstate_ghes_state() will use this structure as basis to do migration: /* ghes.h */ typedef struct AcpiGhesState { uint64_t hest_addr_le; uint64_t ghes_addr_le; bool present; /* True if GHES is present at all on this board */ } AcpiGhesState; /* generic_event_device.c */ static const VMStateDescription vmstate_ghes_state = { .name = "acpi-ged/ghes", .version_id = 1, .minimum_version_id = 1, .needed = ghes_needed, .fields = (VMStateField[]) { VMSTATE_STRUCT(ghes_state, AcpiGedState, 1, vmstate_ghes_state, AcpiGhesState), VMSTATE_END_OF_LIST() } }; /* hw/arm/virt-acpi-build.c */ void virt_acpi_setup(VirtMachineState *vms) { ... if (vms->ras) { assert(vms->acpi_dev); acpi_ged_state = ACPI_GED(vms->acpi_dev); acpi_ghes_add_fw_cfg(&acpi_ged_state->ghes_state, vms->fw_cfg, tables.hardware_errors); } Which relies on acpi_ghes_add_fw_cfg() function to setup callbacks for the migration: /* ghes.c */ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s, GArray *hardware_error) { /* Create a read-only fw_cfg file for GHES */ fw_cfg_add_file(s, ACPI_HW_ERROR_FW_CFG_FILE, hardware_error->data, hardware_error->len); /* Create a read-write fw_cfg file for Address */ fw_cfg_add_file_callback(s, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, NULL, NULL, NULL, &(ags->ghes_addr_le), sizeof(ags->ghes_addr_le), false); fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL, NULL, &(ags->hest_addr_le), sizeof(ags->hest_addr_le), false); ags->present = true; } It sounds to me that both ghes_addr_le and hest_addr_le will be migrated altogether. Did I miss something? Thanks, Mauro
Re: [PATCH v9 02/12] acpi/ghes: rework the logic to handle HEST source ID
Em Wed, 11 Sep 2024 17:01:57 +0200 Igor Mammedov escreveu: > On Sun, 25 Aug 2024 05:45:57 +0200 > Mauro Carvalho Chehab wrote: > > > The current logic is based on a lot of duct tape, with > > offsets calculated based on one define with the number of > > source IDs and an enum. > > > > Rewrite the logic in a way that it would be more resilient > > of code changes, by moving the source ID count to an enum > > and make the offset calculus more explicit. > > > > Such change was inspired on a patch from Jonathan Cameron > > splitting the logic to get the CPER address on a separate > > function, as this will be needed to support generic error > > injection. > > patch is too large and does too many things at once, > see inline suggestions on how to split it in more > manageable chunks. > (I'll mark preferred patch order with numbers) I ended adding more patches to make changes more logic. > > > > > Signed-off-by: Mauro Carvalho Chehab > > > > --- > > > > Changes from v8: > > - Non-rename/cleanup changes merged altogether; > > - source ID is now more generic, defined per guest target. > > That should make easier to add support for 86. > > > > Signed-off-by: Mauro Carvalho Chehab > > --- > > hw/acpi/ghes.c | 275 --- > > hw/arm/virt-acpi-build.c | 10 +- > > include/hw/acpi/ghes.h | 18 +-- > > include/hw/arm/virt.h| 7 + > > target/arm/kvm.c | 3 +- > > 5 files changed, 198 insertions(+), 115 deletions(-) > > > > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c > > index 529c14e3289f..965fb1b36587 100644 > > --- a/hw/acpi/ghes.c > > +++ b/hw/acpi/ghes.c > > @@ -35,9 +35,6 @@ > > /* The max size in bytes for one error block */ > > #define ACPI_GHES_MAX_RAW_DATA_LENGTH (1 * KiB) > > > > -/* Now only support ARMv8 SEA notification type error source */ > > -#define ACPI_GHES_ERROR_SOURCE_COUNT1 > > [patch 4] getting rid of this and introducing num_sources > (aka variable size HEST) ok. > > > /* Generic Hardware Error Source version 2 */ > > #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2 10 > > > > @@ -64,6 +61,19 @@ > > */ > > #define ACPI_GHES_GESB_SIZE 20 > > > > +/* > > + * Offsets with regards to the start of the HEST table stored at > > + * ags->hest_addr_le, according with the memory layout map at > > + * docs/specs/acpi_hest_ghes.rst. > > + */ > perhaps mention in comment/commit message, that hest lookup > is implemented only GHESv2 error sources. Ok, will add a comment, but IMO, it fits better at the routine which handles HEST error sources, so I added this there: /* * Currently, HEST Error source navigates only for GHESv2 tables */ for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) { uint64_t addr = err_source_struct; uint16_t type, src_id; ... > > That will work as far we do forward migration only > (i.e. old qemu -> new qemu), which is what upstream supports. > > However it won't work for backward migration (new qemu -> old qemu) > since old one doesn't know about new non-GHESv2 sources. > And that means we would need to introduce compat knobs for every > new non-GHESv2 source is added. Which is easy to overlook and > it adds up to maintenance. > (You've already described zoo of types ACPI spec has in v8 review, > but I don't thing it's too complex to implement lookup of all > known types. compared to headache we would have with compat > settings if anyone remembers) > > I won't insist on adding all known sources lookup in this series, > if you agree to do it as a patch on top of this series within this > dev cycle (~2 months time-frame). Seems fine to me to place it at the dev cycle. > > +/* ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2 */ > > + ,Table 18-383 > > > +#define HEST_GHES_V2_TABLE_SIZE 92 > > +#define GHES_ACK_OFFSET (64 + GAS_ADDR_OFFSET) > > + > > +/* ACPI 6.2: 18.3.2.7: Generic Hardware Error Source */ > >Table 18-380 'Error Status Address' field Actually on ACPI 6.2, those tables are 18-382 and 18-379. I'll change the above to reflect that: /* ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2 * Table 18-382 Generic Hardware Error Source version 2 (GHESv2) Structure */ #define HEST_GHES_V2_TABLE_SIZE 92 #define GHES_ACK_OFFSET (64 + GAS_ADDR_OFFSET) /* ACPI 6.2: 18.3.2.7: Generic Hardware Error Source * Table 18-379: 'Error S
Re: [PATCH v9 01/12] acpi/ghes: add a firmware file with HEST address
igration for following cases: > 1. (ping-pong case with OLD firmware/ACPI tables) > start old-qemu with 9.1 machine type -> >migrate to file -> >start new-qemu with 9.1 machine type -> restore from file -> >migrate to file -> As I never used migration, I'm a little stuck with the command line parameters. I guess I got the one to do the migration at the monitor: (qemu) migrate file://tmp/migrate But no idea how to start a machine using a saved state. >start old-qemu with 9.1 machine type ->restore from file -> > > 2. (ping-pong case with NEW firmware/ACPI tables) > do the same as #1 but starting with new-qemu binary > > (from upstream pov #2 is optional, but not implementing it > is pain for downstream so it's better to have it if it's not > too much work) If I understood the migration documentation, every when new fields are added, we should increment .version_id. If new version is not backward-compatible, .minimum_version_id is also incremented. So, for a migration-compatible code with a 9.1 VM, the code needs to handle the case where hest_addr_le is not defined, e. g. use offsets relative to ghes_addr_le, just like the current version, e.g.: uint64_t cper_addr, read_ack_start_addr; AcpiGedState *acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED, NULL)); AcpiGhesState *ags = &acpi_ged_state->ghes_state; if (!ags->hest_addr_le) { // Backward-compatible migration code uint64_t base = le64_to_cpu(ags->ghes_addr_le); *read_ack_start_addr = base + ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) + error_source_to_index[notify] * sizeof(uint64_t); *cper_addr = base + ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) + ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) + error_source_to_index[notify] * ACPI_GHES_MAX_RAW_DATA_LENGTH; } else { // Use the new logic from ags->hest_addr_le } There are two problems with that: 1. On your reviews, if I understood right, the code above is not migration safe. So, while implementing it would be technically correct, migration still won't work; 2. With the new code, ACPI_GHES_ERROR_SOURCE_COUNT is not defined anymore, as the size of the error source structure can be different on different architectures, being 2 for new VMs and 1 for old ones. Basically the new code gets it right because it can see a pointer to the HEST table, so it can get the number from there: hest_addr = le64_to_cpu(ags->hest_addr_le); cpu_physical_memory_read(hest_addr, &num_sources, sizeof(num_sources)); But, without hest_addr_le, getting num_sources is not possible. An alternative would be to add a hacky code that works only for arm machines (as new versions may support more archs). Something like: #define V1_ARM_ACPI_GHES_ERROR_SOURCE_COUNT 1 #define V2_ARM_ACPI_GHES_ERROR_SOURCE_COUNT 2 And have a hardcoded logic that would work before/after this changeset but may break on newer versions, if the number of source IDs change, if we add other HEST types, etc. Now, assuming that such hack would work, it sounds too hacky to my taste. So, IMO it is a lot safer to not support migrations from v1 (only ghes_addr_le), using a patch like the enclosed one to ensure that. Btw, checking existing migration structs, it sounds that for almost all structures, .version_id is identical to .minimum_version_id, meaning that migration between different versions aren't supported on most cases. Thanks, Mauro --- [PATCH] acpi/generic_event_device: Update GHES migration to cover hest addr The GHES migration logic at GED should now support HEST table location too. Increase migration version and change needed to check for both ghes_addr_le and hest_addr_le. Signed-off-by: Mauro Carvalho Chehab diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c index b4c83a089a02..efae0ff62c7b 100644 --- a/hw/acpi/generic_event_device.c +++ b/hw/acpi/generic_event_device.c @@ -351,10 +351,11 @@ static const VMStateDescription vmstate_ged_state = { static const VMStateDescription vmstate_ghes = { .name = "acpi-ghes", -.version_id = 1, -.minimum_version_id = 1, +.version_id = 2, +.minimum_version_id = 2, .fields = (const VMStateField[]) { VMSTATE_UINT64(ghes_addr_le, AcpiGhesState), +VMSTATE_UINT64(hest_addr_le, AcpiGhesState), VMSTATE_END_OF_LIST() }, }; @@ -362,13 +363,13 @@ static const VMStateDescription vmstate_ghes = { static bool ghes_needed(void *opaque) { AcpiGedState *s = opaque; -return s->ghes_state.ghes_addr_le; +return s->ghes_state.ghes_addr_le && s->ghes_state.hest_addr_le; } static const VMStateDescription vmstate_ghes_state = { .name = "acpi-ged/ghes", -.version_id = 1, -.minimum_version_id = 1, +.version_id = 2, +.minimum_version_id = 2, .needed = ghes_needed, .fields = (const VMStateField[]) { VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
Re: [PATCH v8 06/13] acpi/ghes: add support for generic error injection via QAPI
Em Fri, 13 Sep 2024 14:28:02 +0200 Igor Mammedov escreveu: > > > 5. Just return an error code without doing any resets. To me, this is > > >the worse scenario. > > > > > > I don't like (5), as if something bad happens, there's nothing to be > > > done. > > > > If it happens on a real system nothing is done either. So I'm not sure > > we need to handle that. Or maybe real hardware reinjects the interrupt > > if the OSPM hasn't done anything about it for a while. > > > > > > > > For QMP error injection (4) seems is overkill. It may be needed in the > > > future if we end implementing a logic where host OS informs guest about > > > hardware problems, and such errors use asynchronous notifications. > > > > > > I would also avoid implementing (3) at least for now, as reporting > > > such error via QMP seems enough for the QMP usecase. > > > > > > So, if ok for you, I'll change the code to (2). > > > > Whilst I don't feel strongly about it, I think 5 is unfortunately the > > correct option if we aren't going to queue errors in qemu (so make it > > an injection tool problem). > > +1 to option (5) Ok, will do (5) then. Thanks, Mauro
[PATCH v10 20/21] target/arm: add an experimental mpidr arm cpu property object
Accurately injecting an ARM Processor error ACPI/APEI GHES error record requires the value of the ARM Multiprocessor Affinity Register (mpidr). While ARM implements it, this is currently not visible. Add a field at CPU storing it, and place it at arm_cpu_properties as experimental, thus allowing it to be queried via QMP using qom-get function. Signed-off-by: Mauro Carvalho Chehab --- target/arm/cpu.c| 1 + target/arm/cpu.h| 1 + target/arm/helper.c | 10 -- 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/target/arm/cpu.c b/target/arm/cpu.c index 19191c239181..30fcf0a10f46 100644 --- a/target/arm/cpu.c +++ b/target/arm/cpu.c @@ -2619,6 +2619,7 @@ static ObjectClass *arm_cpu_class_by_name(const char *cpu_model) static Property arm_cpu_properties[] = { DEFINE_PROP_UINT64("midr", ARMCPU, midr, 0), +DEFINE_PROP_UINT64("x-mpidr", ARMCPU, mpidr, 0), DEFINE_PROP_UINT64("mp-affinity", ARMCPU, mp_affinity, ARM64_AFFINITY_INVALID), DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID), diff --git a/target/arm/cpu.h b/target/arm/cpu.h index f065756c5c7d..bf8e5943af4f 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -1033,6 +1033,7 @@ struct ArchCPU { uint64_t reset_pmcr_el0; } isar; uint64_t midr; +uint64_t mpidr; uint32_t revidr; uint32_t reset_fpsid; uint64_t ctr; diff --git a/target/arm/helper.c b/target/arm/helper.c index 0a582c1cd3b3..d6e7aa069489 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -4690,7 +4690,7 @@ static uint64_t mpidr_read_val(CPUARMState *env) return mpidr; } -static uint64_t mpidr_read(CPUARMState *env, const ARMCPRegInfo *ri) +static uint64_t mpidr_read(CPUARMState *env) { unsigned int cur_el = arm_current_el(env); @@ -4700,6 +4700,11 @@ static uint64_t mpidr_read(CPUARMState *env, const ARMCPRegInfo *ri) return mpidr_read_val(env); } +static uint64_t mpidr_read_ri(CPUARMState *env, const ARMCPRegInfo *ri) +{ +return mpidr_read(env); +} + static const ARMCPRegInfo lpae_cp_reginfo[] = { /* NOP AMAIR0/1 */ { .name = "AMAIR0", .state = ARM_CP_STATE_BOTH, @@ -9721,7 +9726,7 @@ void register_cp_regs_for_features(ARMCPU *cpu) { .name = "MPIDR_EL1", .state = ARM_CP_STATE_BOTH, .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 5, .fgt = FGT_MPIDR_EL1, - .access = PL1_R, .readfn = mpidr_read, .type = ARM_CP_NO_RAW }, + .access = PL1_R, .readfn = mpidr_read_ri, .type = ARM_CP_NO_RAW }, }; #ifdef CONFIG_USER_ONLY static const ARMCPRegUserSpaceInfo mpidr_user_cp_reginfo[] = { @@ -9731,6 +9736,7 @@ void register_cp_regs_for_features(ARMCPU *cpu) modify_arm_cp_regs(mpidr_cp_reginfo, mpidr_user_cp_reginfo); #endif define_arm_cp_regs(cpu, mpidr_cp_reginfo); +cpu->mpidr = mpidr_read(env); } if (arm_feature(env, ARM_FEATURE_AUXCR)) { -- 2.46.0
[PATCH v10 14/21] acpi/ghes: add a notifier to notify when error data is ready
Some error injection notify methods are async, like GPIO notify. Add a notifier to be used when the error record is ready to be sent to the guest OS. Signed-off-by: Mauro Carvalho Chehab --- hw/acpi/ghes.c | 5 + include/hw/acpi/ghes.h | 3 +++ 2 files changed, 8 insertions(+) diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index a8feb39c9f30..7bea265c7ef3 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -409,6 +409,9 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s, ags->present = true; } +NotifierList acpi_generic_error_notifiers = +NOTIFIER_LIST_INITIALIZER(error_device_notifiers); + void ghes_record_cper_errors(const void *cper, size_t len, uint16_t source_id, Error **errp) { @@ -499,6 +502,8 @@ void ghes_record_cper_errors(const void *cper, size_t len, /* Write the generic error data entry into guest memory */ cpu_physical_memory_write(cper_addr, cper, len); + +notifier_list_notify(&acpi_generic_error_notifiers, NULL); } int acpi_ghes_memory_errors(int source_id, uint64_t physical_address) diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h index 7a7961e6078a..83c912338137 100644 --- a/include/hw/acpi/ghes.h +++ b/include/hw/acpi/ghes.h @@ -24,6 +24,9 @@ #include "hw/acpi/bios-linker-loader.h" #include "qapi/error.h" +#include "qemu/notify.h" + +extern NotifierList acpi_generic_error_notifiers; /* * Values for Hardware Error Notification Type field -- 2.46.0
[PATCH v10 02/21] acpi/generic_event_device: Update GHES migration to cover hest addr
The GHES migration logic at GED should now support HEST table location too. Increase migration version and change needed to check for both ghes_addr_le and hest_addr_le. Signed-off-by: Mauro Carvalho Chehab --- hw/acpi/generic_event_device.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c index 15b4c3ebbf24..4e5e387ee2df 100644 --- a/hw/acpi/generic_event_device.c +++ b/hw/acpi/generic_event_device.c @@ -343,10 +343,11 @@ static const VMStateDescription vmstate_ged_state = { static const VMStateDescription vmstate_ghes = { .name = "acpi-ghes", -.version_id = 1, -.minimum_version_id = 1, +.version_id = 2, +.minimum_version_id = 2, .fields = (const VMStateField[]) { VMSTATE_UINT64(ghes_addr_le, AcpiGhesState), +VMSTATE_UINT64(hest_addr_le, AcpiGhesState), VMSTATE_END_OF_LIST() }, }; @@ -354,13 +355,13 @@ static const VMStateDescription vmstate_ghes = { static bool ghes_needed(void *opaque) { AcpiGedState *s = opaque; -return s->ghes_state.ghes_addr_le; +return s->ghes_state.ghes_addr_le && s->ghes_state.hest_addr_le; } static const VMStateDescription vmstate_ghes_state = { .name = "acpi-ged/ghes", -.version_id = 1, -.minimum_version_id = 1, +.version_id = 2, +.minimum_version_id = 2, .needed = ghes_needed, .fields = (const VMStateField[]) { VMSTATE_STRUCT(ghes_state, AcpiGedState, 1, -- 2.46.0
[PATCH v10 16/21] arm/virt: Wire up a GED error device for ACPI / GHES
Adds support to ARM virtualization to allow handling generic error ACPI Event via GED & error source device. It is aligned with Linux Kernel patch: https://lore.kernel.org/lkml/1272350481-27951-8-git-send-email-ying.hu...@intel.com/ Co-authored-by: Mauro Carvalho Chehab Co-authored-by: Jonathan Cameron Signed-off-by: Jonathan Cameron Signed-off-by: Mauro Carvalho Chehab Acked-by: Igor Mammedov --- Changes from v8: - Added a call to the function that produces GHES generic records, as this is now added earlier in this series. Signed-off-by: Mauro Carvalho Chehab --- hw/arm/virt-acpi-build.c | 1 + hw/arm/virt.c| 12 +++- include/hw/arm/virt.h| 1 + 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index 476c365851c4..b0606434c972 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -858,6 +858,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) } acpi_dsdt_add_power_button(scope); +aml_append(scope, aml_error_device()); #ifdef CONFIG_TPM acpi_dsdt_add_tpm(scope, vms); #endif diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 7934b2365163..d970893a3db6 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -677,7 +677,7 @@ static inline DeviceState *create_acpi_ged(VirtMachineState *vms) DeviceState *dev; MachineState *ms = MACHINE(vms); int irq = vms->irqmap[VIRT_ACPI_GED]; -uint32_t event = ACPI_GED_PWR_DOWN_EVT; +uint32_t event = ACPI_GED_PWR_DOWN_EVT | ACPI_GED_ERROR_EVT; if (ms->ram_slots) { event |= ACPI_GED_MEM_HOTPLUG_EVT; @@ -1009,6 +1009,13 @@ static void virt_powerdown_req(Notifier *n, void *opaque) } } +static void virt_generic_error_req(Notifier *n, void *opaque) +{ +VirtMachineState *s = container_of(n, VirtMachineState, generic_error_notifier); + +acpi_send_event(s->acpi_dev, ACPI_GENERIC_ERROR); +} + static void create_gpio_keys(char *fdt, DeviceState *pl061_dev, uint32_t phandle) { @@ -2389,6 +2396,9 @@ static void machvirt_init(MachineState *machine) if (has_ged && aarch64 && firmware_loaded && virt_is_acpi_enabled(vms)) { vms->acpi_dev = create_acpi_ged(vms); +vms->generic_error_notifier.notify = virt_generic_error_req; +notifier_list_add(&acpi_generic_error_notifiers, + &vms->generic_error_notifier); } else { create_gpio_devices(vms, VIRT_GPIO, sysmem); } diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h index aca4f8061b18..24ab84cd623d 100644 --- a/include/hw/arm/virt.h +++ b/include/hw/arm/virt.h @@ -176,6 +176,7 @@ struct VirtMachineState { DeviceState *gic; DeviceState *acpi_dev; Notifier powerdown_notifier; +Notifier generic_error_notifier; PCIBus *bus; char *oem_id; char *oem_table_id; -- 2.46.0
[PATCH v10 09/21] acpi/ghes: Don't hardcode the number of sources on ghes
The number of sources is architecture-dependent. Usually, architectures will implement one synchronous and/or one asynchronous source. Change the logic to better cope with such model. Signed-off-by: Mauro Carvalho Chehab --- hw/acpi/ghes.c | 53 +++- hw/arm/virt-acpi-build.c | 5 include/hw/acpi/ghes.h | 21 ++-- 3 files changed, 49 insertions(+), 30 deletions(-) diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index 4e34b16a6ca9..c88717fb1bef 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -251,17 +251,18 @@ static int acpi_ghes_record_mem_error(uint64_t error_block_address, * Initialize "etc/hardware_errors" and "etc/hardware_errors_addr" fw_cfg blobs. * See docs/specs/acpi_hest_ghes.rst for blobs format. */ -static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker) +static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker, + int num_sources) { int i, error_status_block_offset; /* Build error_block_address */ -for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) { +for (i = 0; i < num_sources; i++) { build_append_int_noprefix(hardware_errors, 0, sizeof(uint64_t)); } /* Build read_ack_register */ -for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) { +for (i = 0; i < num_sources; i++) { /* * Initialize the value of read_ack_register to 1, so GHES can be * writable after (re)boot. @@ -276,13 +277,13 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker) /* Reserve space for Error Status Data Block */ acpi_data_push(hardware_errors, -ACPI_GHES_MAX_RAW_DATA_LENGTH * ACPI_GHES_ERROR_SOURCE_COUNT); +ACPI_GHES_MAX_RAW_DATA_LENGTH * num_sources); /* Tell guest firmware to place hardware_errors blob into RAM */ bios_linker_loader_alloc(linker, ACPI_GHES_ERRORS_FW_CFG_FILE, hardware_errors, sizeof(uint64_t), false); -for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) { +for (i = 0; i < num_sources; i++) { /* * Tell firmware to patch error_block_address entries to point to * corresponding "Generic Error Status Block" @@ -304,10 +305,12 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker) /* Build Generic Hardware Error Source version 2 (GHESv2) */ static void build_ghes_v2(GArray *table_data, BIOSLinker *linker, - enum AcpiGhesNotifyType notify, - uint16_t source_id) + const AcpiNotificationSourceId *notif_src, + uint16_t index, int num_sources) { uint64_t address_offset; +const uint16_t notify = notif_src->notify; +const uint16_t source_id = notif_src->source_id; /* * Type: @@ -336,7 +339,7 @@ static void build_ghes_v2(GArray *table_data, 4 /* QWord access */, 0); bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE, address_offset + GAS_ADDR_OFFSET, sizeof(uint64_t), -ACPI_GHES_ERRORS_FW_CFG_FILE, source_id * sizeof(uint64_t)); +ACPI_GHES_ERRORS_FW_CFG_FILE, index * sizeof(uint64_t)); /* Notification Structure */ build_ghes_hw_error_notification(table_data, notify); @@ -353,9 +356,10 @@ static void build_ghes_v2(GArray *table_data, build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0, 4 /* QWord access */, 0); bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE, -address_offset + GAS_ADDR_OFFSET, -sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE, -(ACPI_GHES_ERROR_SOURCE_COUNT + source_id) * sizeof(uint64_t)); + address_offset + GAS_ADDR_OFFSET, + sizeof(uint64_t), + ACPI_GHES_ERRORS_FW_CFG_FILE, + (num_sources + index) * sizeof(uint64_t)); /* * Read Ack Preserve field @@ -370,12 +374,15 @@ static void build_ghes_v2(GArray *table_data, /* Build Hardware Error Source Table */ void acpi_build_hest(GArray *table_data, GArray *hardware_errors, BIOSLinker *linker, + const AcpiNotificationSourceId * const notif_source, + int num_sources, const char *oem_id, const char *oem_table_id) { AcpiTable table = { .sig = "HEST", .rev = 1, .oem_id = oem_id, .oem_table_id = oem_table_id }; +int i; -build_ghes_error_table(hardware_errors, linker); +build_ghes_error_table(hardware_errors, linker, num_sources); acpi_table_begin(&table, table_data); @@ -383,9 +390,10 @@ void acpi_build_hes
[PATCH v10 07/21] acpi/ghes: rework the logic to handle HEST source ID
The current logic is based on a lot of duct tape, with offsets calculated based on one define with the number of source IDs and an enum. Rewrite the logic in a way that it would be more resilient of code changes, by moving the source ID count to an enum and make the offset calculus more explicit. Such change was inspired on a patch from Jonathan Cameron splitting the logic to get the CPER address on a separate function, as this will be needed to support generic error injection. Signed-off-by: Mauro Carvalho Chehab --- Changes from v9: - patch split on multiple patches to avoid multiple changes at the same patch. Changes from v8: - Non-rename/cleanup changes merged altogether; - source ID is now more generic, defined per guest target. That should make easier to add support for 86. Signed-off-by: Mauro Carvalho Chehab --- hw/acpi/ghes.c | 95 ++ include/hw/acpi/ghes.h | 6 ++- 2 files changed, 73 insertions(+), 28 deletions(-) diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index 36fe5f68782f..6e5f0e8cf0c9 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -61,6 +61,23 @@ */ #define ACPI_GHES_GESB_SIZE 20 +/* + * Offsets with regards to the start of the HEST table stored at + * ags->hest_addr_le, according with the memory layout map at + * docs/specs/acpi_hest_ghes.rst. + */ + +/* ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2 + * Table 18-382 Generic Hardware Error Source version 2 (GHESv2) Structure + */ +#define HEST_GHES_V2_TABLE_SIZE 92 +#define GHES_ACK_OFFSET (64 + GAS_ADDR_OFFSET) + +/* ACPI 6.2: 18.3.2.7: Generic Hardware Error Source + * Table 18-380: 'Error Status Address' field + */ +#define GHES_ERR_ST_ADDR_OFFSET (20 + GAS_ADDR_OFFSET) + /* * Values for error_severity field */ @@ -401,11 +418,13 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s, int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address) { -uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0; -uint64_t start_addr; -bool ret = -1; +uint64_t hest_read_ack_start_addr, read_ack_start_addr; +uint64_t hest_addr, cper_addr, err_source_struct; +uint64_t hest_err_block_addr, error_block_addr; +uint32_t i, ret; AcpiGedState *acpi_ged_state; AcpiGhesState *ags; +uint64_t read_ack; assert(source_id < ACPI_GHES_ERROR_SOURCE_COUNT); @@ -414,44 +433,66 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address) g_assert(acpi_ged_state); ags = &acpi_ged_state->ghes_state; -start_addr = le64_to_cpu(ags->ghes_addr_le); +hest_addr = le64_to_cpu(ags->hest_addr_le); if (!physical_address) { return -1; } -start_addr += source_id * sizeof(uint64_t); +err_source_struct = hest_addr + ACPI_GHES_ERROR_SOURCE_COUNT; -cpu_physical_memory_read(start_addr, &error_block_addr, -sizeof(error_block_addr)); +/* + * Currently, HEST Error source navigates only for GHESv2 tables + */ +for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) { +uint64_t addr = err_source_struct; +uint16_t type, src_id; -error_block_addr = le64_to_cpu(error_block_addr); +cpu_physical_memory_read(addr, &type, sizeof(type)); -read_ack_register_addr = start_addr + -ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t); +/* For now, we only know the size of GHESv2 table */ +assert(type == ACPI_GHES_SOURCE_GENERIC_ERROR_V2); -cpu_physical_memory_read(read_ack_register_addr, -&read_ack_register, sizeof(read_ack_register)); +/* It is GHES. Compare CPER source address */ +addr += sizeof(type); +cpu_physical_memory_read(addr, &src_id, sizeof(src_id)); -/* zero means OSPM does not acknowledge the error */ -if (!read_ack_register) { -error_report("OSPM does not acknowledge previous error," -" so can not record CPER for current error anymore"); -} else if (error_block_addr) { -read_ack_register = cpu_to_le64(0); -/* - * Clear the Read Ack Register, OSPM will write it to 1 when - * it acknowledges this error. - */ -cpu_physical_memory_write(read_ack_register_addr, -&read_ack_register, sizeof(uint64_t)); +if (src_id == source_id) { +break; +} -ret = acpi_ghes_record_mem_error(error_block_addr, -physical_address); -} else { +err_source_struct += HEST_GHES_V2_TABLE_SIZE; +} +if (i == ACPI_GHES_ERROR_SOURCE_COUNT) { error_report("can not find Generic Error Status Block"); + +return -1; } +/* Navigate though table address pointers */ + +hest_err_block
[PATCH v10 08/21] acpi/ghes: Change the type for source_id
HEST source ID is actually a 16-bit value. Yet, make it a little bit more generic using just an integer type. Signed-off-by: Mauro Carvalho Chehab --- hw/acpi/ghes-stub.c| 2 +- hw/acpi/ghes.c | 2 +- include/hw/acpi/ghes.h | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c index c315de1802d6..58a04e935142 100644 --- a/hw/acpi/ghes-stub.c +++ b/hw/acpi/ghes-stub.c @@ -11,7 +11,7 @@ #include "qemu/osdep.h" #include "hw/acpi/ghes.h" -int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address) +int acpi_ghes_record_errors(int source_id, uint64_t physical_address) { return -1; } diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index 6e5f0e8cf0c9..4e34b16a6ca9 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -416,7 +416,7 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s, ags->present = true; } -int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address) +int acpi_ghes_record_errors(int source_id, uint64_t physical_address) { uint64_t hest_read_ack_start_addr, read_ack_start_addr; uint64_t hest_addr, cper_addr, err_source_struct; diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h index 7485f54c28f2..6471934d7775 100644 --- a/include/hw/acpi/ghes.h +++ b/include/hw/acpi/ghes.h @@ -75,7 +75,7 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors, const char *oem_id, const char *oem_table_id); void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s, GArray *hardware_errors); -int acpi_ghes_record_errors(uint8_t source_id, +int acpi_ghes_record_errors(int source_id, uint64_t error_physical_addr); void ghes_record_cper_errors(const void *cper, size_t len, uint16_t source_id, Error **errp); -- 2.46.0
[PATCH v10 03/21] acpi/ghes: get rid of ACPI_HEST_SRC_ID_RESERVED
This is just duplicating ACPI_GHES_ERROR_SOURCE_COUNT, which has a better name. So, drop the duplication. Signed-off-by: Mauro Carvalho Chehab --- hw/acpi/ghes.c | 7 ++- include/hw/acpi/ghes.h | 3 ++- 2 files changed, 4 insertions(+), 6 deletions(-) diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index 529c14e3289f..35f793401d06 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -35,9 +35,6 @@ /* The max size in bytes for one error block */ #define ACPI_GHES_MAX_RAW_DATA_LENGTH (1 * KiB) -/* Now only support ARMv8 SEA notification type error source */ -#define ACPI_GHES_ERROR_SOURCE_COUNT1 - /* Generic Hardware Error Source version 2 */ #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2 10 @@ -411,7 +408,7 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address) AcpiGedState *acpi_ged_state; AcpiGhesState *ags; -assert(source_id < ACPI_HEST_SRC_ID_RESERVED); +assert(source_id < ACPI_GHES_ERROR_SOURCE_COUNT); acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED, NULL)); @@ -422,7 +419,7 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address) if (physical_address) { -if (source_id < ACPI_HEST_SRC_ID_RESERVED) { +if (source_id < ACPI_GHES_ERROR_SOURCE_COUNT) { start_addr += source_id * sizeof(uint64_t); } diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h index 28b956acb19a..5421ffcbb7fa 100644 --- a/include/hw/acpi/ghes.h +++ b/include/hw/acpi/ghes.h @@ -59,7 +59,8 @@ enum AcpiGhesNotifyType { enum { ACPI_HEST_SRC_ID_SEA = 0, /* future ids go here */ -ACPI_HEST_SRC_ID_RESERVED, + +ACPI_GHES_ERROR_SOURCE_COUNT }; typedef struct AcpiGhesState { -- 2.46.0
[PATCH v10 01/21] acpi/ghes: add a firmware file with HEST address
Store HEST table address at GPA, placing its content at hest_addr_le variable. Signed-off-by: Mauro Carvalho Chehab --- Change from v8: - hest_addr_lr is now pointing to the error source size and data. Signed-off-by: Mauro Carvalho Chehab --- hw/acpi/ghes.c | 15 +++ include/hw/acpi/ghes.h | 1 + 2 files changed, 16 insertions(+) diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index e9511d9b8f71..529c14e3289f 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -30,6 +30,7 @@ #define ACPI_GHES_ERRORS_FW_CFG_FILE"etc/hardware_errors" #define ACPI_GHES_DATA_ADDR_FW_CFG_FILE "etc/hardware_errors_addr" +#define ACPI_HEST_ADDR_FW_CFG_FILE "etc/acpi_table_hest_addr" /* The max size in bytes for one error block */ #define ACPI_GHES_MAX_RAW_DATA_LENGTH (1 * KiB) @@ -367,11 +368,22 @@ void acpi_build_hest(GArray *table_data, BIOSLinker *linker, acpi_table_begin(&table, table_data); +int hest_offset = table_data->len; + /* Error Source Count */ build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4); build_ghes_v2(table_data, ACPI_HEST_SRC_ID_SEA, linker); acpi_table_end(linker, &table); + +/* + * tell firmware to write into GPA the address of HEST via fw_cfg, + * once initialized. + */ +bios_linker_loader_write_pointer(linker, + ACPI_HEST_ADDR_FW_CFG_FILE, 0, + sizeof(uint64_t), + ACPI_BUILD_TABLE_FILE, hest_offset); } void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s, @@ -385,6 +397,9 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s, fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL, NULL, &(ags->ghes_addr_le), sizeof(ags->ghes_addr_le), false); +fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL, +NULL, &(ags->hest_addr_le), sizeof(ags->hest_addr_le), false); + ags->present = true; } diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h index 674f6958e905..28b956acb19a 100644 --- a/include/hw/acpi/ghes.h +++ b/include/hw/acpi/ghes.h @@ -63,6 +63,7 @@ enum { }; typedef struct AcpiGhesState { +uint64_t hest_addr_le; uint64_t ghes_addr_le; bool present; /* True if GHES is present at all on this board */ } AcpiGhesState; -- 2.46.0
[PATCH v10 11/21] acpi/ghes: don't crash QEMU if ghes GED is not found
Instead, produce an error and continue working Signed-off-by: Mauro Carvalho Chehab --- hw/acpi/ghes.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index f54865423f69..e47c0238f3c5 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -421,7 +421,10 @@ void ghes_record_cper_errors(const void *cper, size_t len, acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED, NULL)); -g_assert(acpi_ged_state); +if (!acpi_ged_state) { +error_setg(errp, "Can't find ACPI_GED object"); +return; +} ags = &acpi_ged_state->ghes_state; hest_addr = le64_to_cpu(ags->hest_addr_le); -- 2.46.0
[PATCH v10 19/21] scripts/ghes_inject: add a script to generate GHES error inject
Using the QMP GHESv2 API requires preparing a raw data array containing a CPER record. Add a helper script with subcommands to prepare such data. Currently, only ARM Processor error CPER record is supported. Signed-off-by: Mauro Carvalho Chehab --- MAINTAINERS| 3 + scripts/arm_processor_error.py | 377 ++ scripts/ghes_inject.py | 51 +++ scripts/qmp_helper.py | 702 + 4 files changed, 1133 insertions(+) create mode 100644 scripts/arm_processor_error.py create mode 100755 scripts/ghes_inject.py create mode 100644 scripts/qmp_helper.py diff --git a/MAINTAINERS b/MAINTAINERS index 776f94efff02..8816132d40f6 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2100,6 +2100,9 @@ S: Maintained F: hw/arm/ghes_cper.c F: hw/acpi/ghes_cper_stub.c F: qapi/acpi-hest.json +F: scripts/ghes_inject.py +F: scripts/arm_processor_error.py +F: scripts/qmp_helper.py ppc4xx L: qemu-...@nongnu.org diff --git a/scripts/arm_processor_error.py b/scripts/arm_processor_error.py new file mode 100644 index ..62e0c5662232 --- /dev/null +++ b/scripts/arm_processor_error.py @@ -0,0 +1,377 @@ +#!/usr/bin/env python3 +# +# pylint: disable=C0301,C0114,R0903,R0912,R0913,R0914,R0915,W0511 +# SPDX-License-Identifier: GPL-2.0 +# +# Copyright (C) 2024 Mauro Carvalho Chehab + +# TODO: current implementation has dummy defaults. +# +# For a better implementation, a QMP addition/call is needed to +# retrieve some data for ARM Processor Error injection: +# +# - ARM registers: power_state, mpidr. + +import argparse +import re + +from qmp_helper import qmp, util, cper_guid + +class ArmProcessorEinj: +""" +Implements ARM Processor Error injection via GHES +""" + +DESC = """ +Generates an ARM processor error CPER, compatible with +UEFI 2.9A Errata. +""" + +ACPI_GHES_ARM_CPER_LENGTH = 40 +ACPI_GHES_ARM_CPER_PEI_LENGTH = 32 + +# Context types +CONTEXT_AARCH32_EL1 = 1 +CONTEXT_AARCH64_EL1 = 5 +CONTEXT_MISC_REG = 8 + +def __init__(self, subparsers): +"""Initialize the error injection class and add subparser""" + +# Valid choice values +self.arm_valid_bits = { +"mpidr":util.bit(0), +"affinity": util.bit(1), +"running": util.bit(2), +"vendor": util.bit(3), +} + +self.pei_flags = { +"first":util.bit(0), +"last": util.bit(1), +"propagated": util.bit(2), +"overflow": util.bit(3), +} + +self.pei_error_types = { +"cache":util.bit(1), +"tlb": util.bit(2), +"bus": util.bit(3), +"micro-arch": util.bit(4), +} + +self.pei_valid_bits = { +"multiple-error": util.bit(0), +"flags":util.bit(1), +"error-info": util.bit(2), +"virt-addr":util.bit(3), +"phy-addr": util.bit(4), +} + +self.data = bytearray() + +parser = subparsers.add_parser("arm", description=self.DESC) + +arm_valid_bits = ",".join(self.arm_valid_bits.keys()) +flags = ",".join(self.pei_flags.keys()) +error_types = ",".join(self.pei_error_types.keys()) +pei_valid_bits = ",".join(self.pei_valid_bits.keys()) + +# UEFI N.16 ARM Validation bits +g_arm = parser.add_argument_group("ARM processor") +g_arm.add_argument("--arm", "--arm-valid", + help=f"ARM valid bits: {arm_valid_bits}") +g_arm.add_argument("-a", "--affinity", "--level", "--affinity-level", + type=lambda x: int(x, 0), + help="Affinity level (when multiple levels apply)") +g_arm.add_argument("-l", "--mpidr", type=lambda x: int(x, 0), + help="Multiprocessor Affinity Register") +g_arm.add_argument("-i", "--midr", type=lambda x: int(x, 0), + help="Main ID Register") +g_arm.add_argument("-r", "--running", + action=argparse.BooleanOptionalAction, + default=None, + help="Indicates if the processor is running or not") +g_arm.add_argument("--psci", "--psci-state", + type=lambda x: int(x,
[PATCH v10 05/21] acpi/ghes: better handle source_id and notification
GHES has two fields with somewhat meanings: - notification type, which is a number defined at the ACPI spec containing several arch-specific synchronous and assynchronous types; - source id, which is a HW/FW defined number, used to distinguish between different implemented hardware report mechanisms. The current logic is arm-specific, implementing a single source ID, for an armv8-specific synchronous report mechanism (SEA). Cleanup the code to make easier to add other types and make the code portable to non-ARM. As a collateral effect of such change, build_ghes_error_table() function is now an internal function. Signed-off-by: Mauro Carvalho Chehab --- Changes from v8: - Non-rename/cleanup changes merged altogether; - source ID is now more generic, defined per guest target. That should make easier to add support for 86. Signed-off-by: Mauro Carvalho Chehab --- hw/acpi/ghes.c | 31 +++ hw/arm/virt-acpi-build.c | 5 ++--- include/hw/acpi/ghes.h | 6 +++--- 3 files changed, 20 insertions(+), 22 deletions(-) diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index 17b7d9e10f3e..939e89723a2f 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -234,7 +234,7 @@ static int acpi_ghes_record_mem_error(uint64_t error_block_address, * Initialize "etc/hardware_errors" and "etc/hardware_errors_addr" fw_cfg blobs. * See docs/specs/acpi_hest_ghes.rst for blobs format. */ -void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker) +static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker) { int i, error_status_block_offset; @@ -285,9 +285,13 @@ void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker) } /* Build Generic Hardware Error Source version 2 (GHESv2) */ -static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker) +static void build_ghes_v2(GArray *table_data, + BIOSLinker *linker, + enum AcpiGhesNotifyType notify, + uint16_t source_id) { uint64_t address_offset; + /* * Type: * Generic Hardware Error Source version 2(GHESv2 - Type 10) @@ -317,18 +321,8 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker) address_offset + GAS_ADDR_OFFSET, sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE, source_id * sizeof(uint64_t)); -switch (source_id) { -case ACPI_HEST_SRC_ID_SEA: -/* - * Notification Structure - * Now only enable ARMv8 SEA notification type - */ -build_ghes_hw_error_notification(table_data, ACPI_GHES_NOTIFY_SEA); -break; -default: -error_report("Not support this error source"); -abort(); -} +/* Notification Structure */ +build_ghes_hw_error_notification(table_data, notify); /* Error Status Block Length */ build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4); @@ -357,19 +351,24 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker) } /* Build Hardware Error Source Table */ -void acpi_build_hest(GArray *table_data, BIOSLinker *linker, +void acpi_build_hest(GArray *table_data, GArray *hardware_errors, + BIOSLinker *linker, const char *oem_id, const char *oem_table_id) { AcpiTable table = { .sig = "HEST", .rev = 1, .oem_id = oem_id, .oem_table_id = oem_table_id }; +build_ghes_error_table(hardware_errors, linker); + acpi_table_begin(&table, table_data); +/* Beginning at the HEST Error Source struct count and data */ int hest_offset = table_data->len; /* Error Source Count */ build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4); -build_ghes_v2(table_data, ACPI_HEST_SRC_ID_SEA, linker); +build_ghes_v2(table_data, linker, + ACPI_GHES_NOTIFY_SEA, ACPI_HEST_SRC_ID_SEA); acpi_table_end(linker, &table); diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index f76fb117adff..bafd9a56c217 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -943,10 +943,9 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables) build_dbg2(tables_blob, tables->linker, vms); if (vms->ras) { -build_ghes_error_table(tables->hardware_errors, tables->linker); acpi_add_table(table_offsets, tables_blob); -acpi_build_hest(tables_blob, tables->linker, vms->oem_id, -vms->oem_table_id); +acpi_build_hest(tables_blob, tables->hardware_errors, tables->linker, +vms->oem_id, vms->oem_table_id); } if (ms->numa_state->num_nodes > 0) { diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h index 5421ffcbb7fa..c
[PATCH v10 17/21] qapi/acpi-hest: add an interface to do generic CPER error injection
Creates a QMP command to be used for generic ACPI APEI hardware error injection (HEST) via GHESv2, and add support for it for ARM guests. Error injection uses ACPI_HEST_SRC_ID_QMP source ID to be platform independent. This is mapped at arch virt bindings, depending on the types supported by QEMU and by the BIOS. So, on ARM, this is supported via ACPI_GHES_NOTIFY_GPIO notification type. This patch is co-authored: - original ghes logic to inject a simple ARM record by Shiju Jose; - generic logic to handle block addresses by Jonathan Cameron; - generic GHESv2 error inject by Mauro Carvalho Chehab; Co-authored-by: Jonathan Cameron Co-authored-by: Shiju Jose Co-authored-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Cameron Signed-off-by: Shiju Jose Signed-off-by: Mauro Carvalho Chehab --- Changes since v9: - ARM source IDs renamed to reflect SYNC/ASYNC; - command name changed to better reflect what it does; - some improvements at JSON documentation; - add a check for QMP source at the notification logic. Signed-off-by: Mauro Carvalho Chehab --- MAINTAINERS | 7 +++ hw/acpi/Kconfig | 5 + hw/acpi/ghes.c | 2 +- hw/acpi/ghes_cper.c | 32 hw/acpi/ghes_cper_stub.c | 19 +++ hw/acpi/meson.build | 2 ++ hw/arm/virt-acpi-build.c | 1 + hw/arm/virt.c| 7 +++ include/hw/acpi/ghes.h | 1 + include/hw/arm/virt.h| 1 + qapi/acpi-hest.json | 35 +++ qapi/meson.build | 1 + qapi/qapi-schema.json| 1 + 13 files changed, 113 insertions(+), 1 deletion(-) create mode 100644 hw/acpi/ghes_cper.c create mode 100644 hw/acpi/ghes_cper_stub.c create mode 100644 qapi/acpi-hest.json diff --git a/MAINTAINERS b/MAINTAINERS index c59f7b253825..776f94efff02 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2094,6 +2094,13 @@ F: hw/acpi/ghes.c F: include/hw/acpi/ghes.h F: docs/specs/acpi_hest_ghes.rst +ACPI/HEST/GHES/ARM processor CPER +R: Mauro Carvalho Chehab +S: Maintained +F: hw/arm/ghes_cper.c +F: hw/acpi/ghes_cper_stub.c +F: qapi/acpi-hest.json + ppc4xx L: qemu-...@nongnu.org S: Orphan diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig index e07d3204eb36..73ffbb82c150 100644 --- a/hw/acpi/Kconfig +++ b/hw/acpi/Kconfig @@ -51,6 +51,11 @@ config ACPI_APEI bool depends on ACPI +config GHES_CPER +bool +depends on ACPI_APEI +default y + config ACPI_PCI bool depends on ACPI && PCI diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index 7bea265c7ef3..d7d147ef40f2 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -503,7 +503,7 @@ void ghes_record_cper_errors(const void *cper, size_t len, /* Write the generic error data entry into guest memory */ cpu_physical_memory_write(cper_addr, cper, len); -notifier_list_notify(&acpi_generic_error_notifiers, NULL); +notifier_list_notify(&acpi_generic_error_notifiers, &source_id); } int acpi_ghes_memory_errors(int source_id, uint64_t physical_address) diff --git a/hw/acpi/ghes_cper.c b/hw/acpi/ghes_cper.c new file mode 100644 index ..02c47b41b990 --- /dev/null +++ b/hw/acpi/ghes_cper.c @@ -0,0 +1,32 @@ +/* + * CPER payload parser for error injection + * + * Copyright(C) 2024 Huawei LTD. + * + * This code is licensed under the GPL version 2 or later. See the + * COPYING file in the top-level directory. + * + */ + +#include "qemu/osdep.h" + +#include "qemu/base64.h" +#include "qemu/error-report.h" +#include "qemu/uuid.h" +#include "qapi/qapi-commands-acpi-hest.h" +#include "hw/acpi/ghes.h" + +void qmp_inject_ghes_error(const char *qmp_cper, Error **errp) +{ + +uint8_t *cper; +size_t len; + +cper = qbase64_decode(qmp_cper, -1, &len, errp); +if (!cper) { +error_setg(errp, "missing GHES CPER payload"); +return; +} + +ghes_record_cper_errors(cper, len, ACPI_HEST_SRC_ID_QMP, errp); +} diff --git a/hw/acpi/ghes_cper_stub.c b/hw/acpi/ghes_cper_stub.c new file mode 100644 index ..8782e2c02fa8 --- /dev/null +++ b/hw/acpi/ghes_cper_stub.c @@ -0,0 +1,19 @@ +/* + * Stub interface for CPER payload parser for error injection + * + * Copyright(C) 2024 Huawei LTD. + * + * This code is licensed under the GPL version 2 or later. See the + * COPYING file in the top-level directory. + * + */ + +#include "qemu/osdep.h" +#include "qapi/error.h" +#include "qapi/qapi-commands-acpi-hest.h" +#include "hw/acpi/ghes.h" + +void qmp_inject_ghes_error(const char *cper, Error **errp) +{ +error_setg(errp, "GHES QMP error inject is not compiled in"); +} diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build index fa5c07db9068..6cbf430eb66d 100644 --- a/hw/acpi/meson.build +++ b/hw/acpi/meson.build @@ -34,4 +34,6 @@ endif system_ss.add(when: 'CONFIG_ACPI'