On Mon, 16 Oct 2023 at 18:49, Paolo Bonzini <pbonz...@redhat.com> wrote: > > switching to hv_vcpu_run_until() WITHOUT hv_vcpu_interrupt() > > causes some very obvious problems where the vCPU simply > > doesn't exit at all for long periods.) > > Yes, that makes sense. It looks like hv_vcpu_run_until() has an > equivalent of a "do ... while (errno == EINTR)" loop inside it.
Thinking about this some more, I wonder if it's worth putting together some test code to check empirically how signals and hv_vcpu_interrupt interact with each of the 2 vcpu_run implementations. It should be pretty straightforward to establish whether calling hv_vcpu_interrupt *before* hv_vcpu_run*() causes an instant exit when it is called, and whether the signal causes hv_vcpu_run() to exit. (Based on observed behaviour, I'm already pretty confident hv_vcpu_run_until() ignores signals until it exits for other reasons.) > > 1. hv_vcpu_run() exits very frequently, and often there is actually > > nothing for the VMM to do except call hv_vcpu_run() again. With > > Qemu's current hvf backend, each exit causes a BQL acquisition, > > and VMs with a bunch of vCPUs rapidly become limited by BQL > > contention according to my profiling. > > Yes, that should be fixed anyway, but I agree it is a separate issue. I've locally got a bunch of very messy patches for reducing BQL acquisition in the x86 HVF vCPU loop. I found it difficult to make much of a real difference however, and the code gets a lot harder to follow. - Decoding instructions that need emulating can be done outside the lock. Actually running the instruction typically ends up causing an action that needs it though, so the win isn't that big. - With hv_vcpu_run() there are a bunch of (EPT fault) exits that don't really need anything in particular to be done. Or instead of detecting those outside the lock you can switch to hv_vcpu_run_until() which avoids those exits altogether. - KVM's vCPU loop calls into MMIO faults without the BQL, but they acquire it internally I think? - Going from xAPIC to x2APIC reduces the number of exits, and using MSRs is a bit quicker than walking the memory hierarchy, so that definitely helps too, but the APIC implementation still needs the BQL held, and untangling that is probably harder than switching to an in-kernel APIC. Beyond that: there's a good chance that turning the BQL into a read-write lock would help, but that's a much bigger undertaking than I'm currently willing to entertain! Re hvf fd: > I think fd and the HVF type should be placed in an anonymous union. Taking a look at the code and thinking through the implications, I'm not actually sure what the intention of the union is? IIRC Qemu is built with strict aliasing rules disabled, but there seems little point in actively using union aliasing here? We can just as easily modify this block at the top of hvf_int.h: #ifdef __aarch64__ #include <Hypervisor/Hypervisor.h> #else #include <Hypervisor/hv.h> #endif to something like: #ifdef __aarch64__ #include <Hypervisor/Hypervisor.h> typedef hv_vcpu_t hvf_vcpu_id; #else #include <Hypervisor/hv.h> typedef hv_vcpuid_t hvf_vcpu_id; #endif And then: struct AccelCPUState { hvf_vcpu_id fd; void *exit; … Or perhaps this variant, if we want AccelCPUState to have consistent size/alignment properties, we can use a union after all, but never actually use the fd_padding field: struct AccelCPUState { union { hvf_vcpu_id fd; uint64_t fd_padding; }; void *exit; … (Or is that what you were thinking of with the union idea in the first place?) Thanks, Phil