On Mon, 24 Feb 2025 at 11:29, Daniel Henrique Barboza <dbarb...@ventanamicro.com> wrote: > > > > On 2/24/25 6:59 AM, Peter Maydell wrote: > > On Thu, 20 Feb 2025 at 16:14, Daniel Henrique Barboza > > <dbarb...@ventanamicro.com> wrote: > >> > >> riscv_cpu_reset_hold() does a lot of TCG-related initializations that > >> aren't relevant for KVM, but nevertheless are impacting the reset state > >> of KVM vcpus. > >> > >> When running a KVM guest, kvm_riscv_reset_vcpu() is called at the end of > >> reset_hold(). At that point env->mstatus is initialized to a non-zero > >> value, and it will be use to write 'sstatus' in the vcpu > >> (kvm_arch_put_registers() then kvm_riscv_put_regs_csr()). > >> > >> Do an early exit in riscv_cpu_reset_hold() if we're running KVM. All the > >> KVM reset procedure will be centered in kvm_riscv_reset_vcpu(). > >> > >> While we're at it, remove the kvm_enabled() check in > >> kvm_riscv_reset_vcpu() since it's already being gated in > >> riscv_cpu_reset_hold().
> > This looks super odd, from an "I don't know anything about > > riscv specifics" position. Generally the idea is: > > * reset in QEMU should reset the CPU state > > * what a reset CPU looks like doesn't differ between > > accelerators > > * when we start the KVM CPU, we copy the state from QEMU > > to the kernel, and then the kernel's idea of the reset state > > matches > > > > This patch looks like it's entirely skipping basically > > all of the QEMU CPU state reset code specifically for KVM. > > Not sure I understood what you said here. > > Without this patch, riscv_cpu_reset_hold() is doing initializations that are > TCG > based, both for user mode and system mode, and in the end is calling the kvm > specific reset function if we're running KVM. This patch is simply skipping > all the TCG related reset procedures if we're running KVM. So the patch isn't > skipping the KVM specific QEMU CPU reset code, it is skipping the TCG specific > reset code if we're running KVM. What I'm saying is that you shouldn't have "TCG reset" and "KVM reset" that are totally different code paths, but that the reset function should be doing "reset the CPU", and then the KVM codepath makes specific decisions about "for KVM these particular things should be the kernel's reset settings" and then passes that state over to the kernel. > Granted, after applying patches 2 and 3 we could discard this patch because > now we're resetting all that KVM needs in kvm_reset_vcpu(), but why go > through the reset steps for TCG if we're going to overwrite them later during > kvm_reset_vcpu()? The idea is that you only overwrite specific state where you've decided "actually the kernel is the authoritative source for what the reset state for these registers is". > > So now you'll have two different pieces of code controlling > > reset for different accelerators, and the resulting CPU > > state won't be consistent between them... > > That is already the case even without this patch, doesn't it? If we have to > call > a specific kvm reset function during cpu reset_hold then we're already in a > point > where the reset procedure is differing between accelerators. I won't say that > this is a good design but I don't see it as a problem. > For instance, going to a code you're probably more familiar with, > target/arm/cpu.c, > arm_cpu_reset_hold(), is doing something similar to what > riscv_cpu_reset_hold() is > doing without this patch: a lot of TCG setups are made, then > kvm_arm_reset_vcpu() is > called in the end if kvm_enabled(). kvm_arm_reset_vcpu() then overwrites at > least some > of the TCG specific setup that was done before: > > /* Re-init VCPU so that all registers are set to > * their respective reset values. > */ > ret = kvm_arm_vcpu_init(cpu); > > kvm_arm_vcpu_init() is doing a KVM_ARM_VCPU_INIT ioctl that will populate the > CPU object > with the kernel specific feature bitmask and so on. Note that this is not > copying the TCG > setup into the kernel, it is in fact doing the opposite. The way this code path in Arm works is: * we reset all the QEMU-side CPU state struct in arm_cpu_reset_hold() * kvm_arm_reset_vcpu() calls kvm_arm_vcpu_init() which inits the KVM side vcpu * kvm_arm_reset_vcpu() calls the sequence write_kvmstate_to_list() followed by write_list_to_cpustate() which does "for the system registers specifically, read the values (and which registers we have) from KVM, and write them to the QEMU CPU state struct". We do this because on Arm we've said that the system registers in particular have the kernel as their authoritative source. (IIRC x86 makes QEMU the authority for its similar registers, so it has to do even less in its kvm_arch_reset_vcpu().) * the generic CPU core code will call kvm_arch_put_registers() before starting the vCPU, which copies whatever is in the QEMU-side CPU state struct into KVM The upshot is that QEMU is the authority and arm_cpu_reset_hold() defines the reset value for all CPU state by default. (For instance, reset values for general purpose registers are set there.) But for specific parts of the state where we want KVM to be the authority, kvm_arm_reset_vcpu() gets to override that by filling in the CPU state struct by asking the kernel what its values are. > Note that my intention here isn't to make a case that the ARM KVM cpu doesn't > need anything > that is being done in arm_cpu_reset_hold(). My point here is that KVM and TCG > CPUs will have > different reset setups for some archs. For RISC-V I can say that KVM CPU does > not rely on > anything that the TCG reset code is doing, hence why I sent this patch to > make it official. What I'm trying to suggest here is that we don't want different architectures to end up doing things differently. There are multiple different design patterns that will work, but it will be easier to work on QEMU if we can standardise on doing it the same way across architectures. thanks -- PMM