On 2/24/25 8:47 AM, Peter Maydell wrote:
On Mon, 24 Feb 2025 at 11:29, Daniel Henrique Barboza
<dbarb...@ventanamicro.com> wrote:



On 2/24/25 6:59 AM, Peter Maydell wrote:
On Thu, 20 Feb 2025 at 16:14, Daniel Henrique Barboza
<dbarb...@ventanamicro.com> wrote:

riscv_cpu_reset_hold() does a lot of TCG-related initializations that
aren't relevant for KVM, but nevertheless are impacting the reset state
of KVM vcpus.

When running a KVM guest, kvm_riscv_reset_vcpu() is called at the end of
reset_hold(). At that point env->mstatus is initialized to a non-zero
value, and it will be use to write 'sstatus' in the vcpu
(kvm_arch_put_registers() then kvm_riscv_put_regs_csr()).

Do an early exit in riscv_cpu_reset_hold() if we're running KVM. All the
KVM reset procedure will be centered in kvm_riscv_reset_vcpu().

While we're at it, remove the kvm_enabled() check in
kvm_riscv_reset_vcpu() since it's already being gated in
riscv_cpu_reset_hold().

This looks super odd, from an "I don't know anything about
riscv specifics" position. Generally the idea is:
   * reset in QEMU should reset the CPU state
   * what a reset CPU looks like doesn't differ between
     accelerators
   * when we start the KVM CPU, we copy the state from QEMU
     to the kernel, and then the kernel's idea of the reset state
     matches

This patch looks like it's entirely skipping basically
all of the QEMU CPU state reset code specifically for KVM.

Not sure I understood what you said here.

Without this patch, riscv_cpu_reset_hold() is doing initializations that are TCG
based, both for user mode and system mode, and in the end is calling the kvm
specific reset function if we're running KVM. This patch is simply skipping
all the TCG related reset procedures if we're running KVM. So the patch isn't
skipping the KVM specific QEMU CPU reset code, it is skipping the TCG specific
reset code if we're running KVM.

What I'm saying is that you shouldn't have "TCG reset" and
"KVM reset" that are totally different code paths, but that the
reset function should be doing "reset the CPU", and then the
KVM codepath makes specific decisions about "for KVM these
particular things should be the kernel's reset settings" and
then passes that state over to the kernel.

Granted, after applying patches 2 and 3 we could discard this patch because
now we're resetting all that KVM needs in kvm_reset_vcpu(), but why go
through the reset steps for TCG if we're going to overwrite them later during
kvm_reset_vcpu()?

The idea is that you only overwrite specific state where
you've decided "actually the kernel is the authoritative
source for what the reset state for these registers is".

So now you'll have two different pieces of code controlling
reset for different accelerators, and the resulting CPU
state won't be consistent between them...

That is already the case even without this patch, doesn't it? If we have to call
a specific kvm reset function during cpu reset_hold then we're already in a 
point
where the reset procedure is differing between accelerators. I won't say that
this is a good design but I don't see it as a problem.

For instance, going to a code you're probably more familiar with, 
target/arm/cpu.c,
arm_cpu_reset_hold(), is doing something similar to what riscv_cpu_reset_hold() 
is
doing without this patch: a lot of TCG setups are made, then 
kvm_arm_reset_vcpu() is
called in the end if kvm_enabled(). kvm_arm_reset_vcpu() then overwrites at 
least some
of the TCG specific setup that was done before:

      /* Re-init VCPU so that all registers are set to
       * their respective reset values.
       */
      ret = kvm_arm_vcpu_init(cpu);

kvm_arm_vcpu_init() is doing a KVM_ARM_VCPU_INIT ioctl that will populate the 
CPU object
with the kernel specific feature bitmask and so on. Note that this is not 
copying the TCG
setup into the kernel, it is in fact doing the opposite.

The way this code path in Arm works is:
  * we reset all the QEMU-side CPU state struct in arm_cpu_reset_hold()
  * kvm_arm_reset_vcpu() calls kvm_arm_vcpu_init() which inits
    the KVM side vcpu
  * kvm_arm_reset_vcpu() calls the sequence write_kvmstate_to_list()
    followed by write_list_to_cpustate() which does "for the system
    registers specifically, read the values (and which registers we
    have) from KVM, and write them to the QEMU CPU state struct".
    We do this because on Arm we've said that the system registers
    in particular have the kernel as their authoritative source.
    (IIRC x86 makes QEMU the authority for its similar registers,
    so it has to do even less in its kvm_arch_reset_vcpu().)
  * the generic CPU core code will call kvm_arch_put_registers()
    before starting the vCPU, which copies whatever is in the QEMU-side
    CPU state struct into KVM

The upshot is that QEMU is the authority and arm_cpu_reset_hold()
defines the reset value for all CPU state by default. (For instance,
reset values for general purpose registers are set there.) But for
specific parts of the state where we want KVM to be the authority,
kvm_arm_reset_vcpu() gets to override that by filling in the CPU
state struct by asking the kernel what its values are.

Note that my intention here isn't to make a case that the ARM KVM cpu doesn't 
need anything
that is being done in arm_cpu_reset_hold(). My point here is that KVM and TCG 
CPUs will have
different reset setups for some archs. For RISC-V I can say that KVM CPU does 
not rely on
anything that the TCG reset code is doing, hence why I sent this patch to make 
it official.

What I'm trying to suggest here is that we don't want different
architectures to end up doing things differently. There are
multiple different design patterns that will work, but it will
be easier to work on QEMU if we can standardise on doing it
the same way across architectures.

I can get behind this argument.

For this patch I'll just remove the redundant !kvm_enabled() check in 
kvm_riscv_reset_vcpu().
Let's keep the kvm_riscv_reset_vcpu() call at the end of riscv_cpu_reset_hold() 
to keep
the design where we have a single reset procedure, overwriting the needed state 
where
we want KVM to be the authority instead of QEMU.


Thanks,

Daniel


thanks
-- PMM


Reply via email to