On 2/21/25 9:04 PM, Jonathan Cameron wrote:
On Fri, 21 Feb 2025 15:27:36 +1000
Gavin Shan <gs...@redhat.com> wrote:
[...]
I would say #1 is the ideal model because the read_ack_register is the
bottleneck
and it should be scaled up to max_cpus. In that way, the bottleneck can be
avoided
from the bottom. Another benefit with #1 is the error can be delivered
immediately
to the vCPU where the error was raised. This matches with the syntax of SEA to
me.
I don't think it helps for the bottleneck in linux at least. A whole bunch of
locks
are taken on each SEA because of the novel use of the fixmap. There is only one
VA ever used to access the error status blocks we just change what PA it points
to
under a spin lock. Maybe that can be improved on if we can persuade people that
error
handling performance is a thing to care about!
Right, it doesn't helps for the bottleneck in guest kernel due to
@ghes_notify_lock_sea.
With the lock, all existing GHES devices and error statuses are serialized for
access. I
was actually talking about the benefit to avoid the bottleneck regarding the
read_ack_regsiter,
which is the synchronization mechanism between guest kernel and QEMU. For
example, an error
has been raised on vCPU-0, but not acknowledged at (A). Another error raised on
vCPU-1
can't be delivered because we have only one GHES device and error status block,
which
has been reserved for the error raised on vCPU-0. With solution #1, the
bottleneck can
be avoided with multiple GHES devices and error status blocks.
vCPU-0 vCPU-1
====== ======
kvm_cpu_exec kvm_cpu_exec
kvm_vcpu_ioctl(RUN) kvm_vcpu_ioctl(RUN)
kvm_arch_on_sigbus_vcpu kvm_arch_on_sigbus_vcpu
acpi_ghes_memory_errors acpi_ghes_memory_errors
(B)
kvm_inject_arm_sea
kvm_vcpu_ioctl(RUN)
:
do_mem_abort
do_sea
apei_claim_sea
ghes_notify_sea
raw_spin_lock(&ghes_notify_lock_sea)
ghes_in_nmi_spool_from_list
ghes_in_nmi_queue_one_entry
ghes_clear_estatus (A)
raw_spin_unlock(&ghes_notify_lock_sea)
Thanks,
Gavin