Em Fri, 14 Feb 2025 14:16:31 +1000 Gavin Shan <gs...@redhat.com> escreveu:
> Currently, there is only one CPER buffer (entry), meaning only one > memory error can be reported. In extreme case, multiple memory errors > can be raised on different vCPUs. For example, a singile memory error > on a 64KB page of the host can results in 16 memory errors to 4KB > pages of the guest. There is already a patchset allowing to have multiple CPER entries floating around since last year: https://lore.kernel.org/qemu-devel/cover.1738345063.git.mchehab+hua...@kernel.org/ I guess it is almost ready for being merged, needing just some nitpick changes to satisfy ACPI maintainers. Such changeset already adds a second CPER entry for GED, and allows to easily add more as needed. > In extreme case, multiple memory errors > can be raised on different vCPUs. For example, a singile memory error > on a 64KB page of the host can results in 16 memory errors to 4KB > pages of the guest. > Unfortunately, the virtual machine is simply aborted > by multiple concurrent memory errors, as the following call trace shows. > A SEA exception is injected to the guest so that the CPER buffer can > be claimed if the error is successfully pushed by acpi_ghes_memory_errors(), > Otherwise, abort() is triggered to crash the virtual machine. > > kvm_vcpu_thread_fn > kvm_cpu_exec > kvm_arch_on_sigbus_vcpu > kvm_cpu_synchronize_state > acpi_ghes_memory_errors (a) > kvm_inject_arm_sea | abort > > It's arguably to crash the virtual machine in this case. The better > behaviour would be to retry on pushing the memory errors, to keep the > virtual machine alive so that the administrator has chance to chime > in, for example to dump the important data with luck. This series > adds one more parameter to acpi_ghes_memory_errors() so that it will > be tried to push the memory error until it succeeds. Having a retry buffer might be interesting for some types of errors, like error-injected and corrected errors. Yet, it doesn't sound right to buffer uncorrected errors that would affect the virtual machine. > > Gavin Shan (4): > acpi/ghes: Make ghes_record_cper_errors() static > acpi/ghes: Use error_report() in ghes_record_cper_errors() > acpi/ghes: Allow retry to write CPER errors > target/arm: Retry pushing CPER error if necessary > > hw/acpi/ghes-stub.c | 3 ++- > hw/acpi/ghes.c | 45 +++++++++++++++++++++--------------------- > include/hw/acpi/ghes.h | 5 ++--- > target/arm/kvm.c | 31 +++++++++++++++++++++++------ > 4 files changed, 51 insertions(+), 33 deletions(-) > Thanks, Mauro