In the combination of 64KB host and 4KB guest, a problematic host page affects 16x guest pages. In this specific case, it's reasonable to push 16 consecutive memory CPERs. Otherwise, QEMU can run into core dump due to the current error can't be delivered as the previous error isn't acknoledges. It's caused by the nature the host page can be accessed in parallel due to the mismatched host and guest page sizes.
Imporve push_ghes_memory_errors() to push 16x consecutive memory CPERs for this specific case. The maximal error block size is bumped to 4KB, providing enough storage space for those 16x memory CPERs. Signed-off-by: Gavin Shan <gs...@redhat.com> --- hw/acpi/ghes.c | 2 +- target/arm/kvm.c | 46 +++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 46 insertions(+), 2 deletions(-) diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index 34ff682048..43d52f5e2e 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -33,7 +33,7 @@ #define ACPI_HEST_ADDR_FW_CFG_FILE "etc/acpi_table_hest_addr" /* The max size in bytes for one error block */ -#define ACPI_GHES_MAX_RAW_DATA_LENGTH (1 * KiB) +#define ACPI_GHES_MAX_RAW_DATA_LENGTH (4 * KiB) /* Generic Hardware Error Source version 2 */ #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2 10 diff --git a/target/arm/kvm.c b/target/arm/kvm.c index e31fcde797..c346bd7b49 100644 --- a/target/arm/kvm.c +++ b/target/arm/kvm.c @@ -11,6 +11,7 @@ */ #include "qemu/osdep.h" +#include "qemu/units.h" #include <sys/ioctl.h> #include <linux/kvm.h> @@ -2337,10 +2338,53 @@ static void push_ghes_memory_errors(CPUState *c, AcpiGhesState *ags, uint64_t paddr) { GArray *addresses = g_array_new(false, false, sizeof(paddr)); + uint64_t val, start, end, guest_pgsz, host_pgsz; int ret; kvm_cpu_synchronize_state(c); - g_array_append_vals(addresses, &paddr, 1); + + /* + * Sort out the guest page size from TCR_EL1, which can be modified + * by the guest from time to time. So we have to sort it out dynamically. + */ + ret = read_sys_reg64(c->kvm_fd, &val, ARM64_SYS_REG(3, 0, 2, 0, 2)); + if (ret) { + goto error; + } + + switch (extract64(val, 14, 2)) { + case 0: + guest_pgsz = 4 * KiB; + break; + case 1: + guest_pgsz = 64 * KiB; + break; + case 2: + guest_pgsz = 16 * KiB; + break; + default: + error_report("unknown page size from TCR_EL1 (0x%" PRIx64 ")", val); + goto error; + } + + host_pgsz = qemu_real_host_page_size(); + start = paddr & ~(host_pgsz - 1); + end = start + host_pgsz; + while (start < end) { + /* + * The precise physical address is provided for the affected + * guest page that contains @paddr. Otherwise, the starting + * address of the guest page is provided. + */ + if (paddr >= start && paddr < (start + guest_pgsz)) { + g_array_append_vals(addresses, &paddr, 1); + } else { + g_array_append_vals(addresses, &start, 1); + } + + start += guest_pgsz; + } + ret = acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC, addresses); if (ret) { goto error; -- 2.49.0