On Thu, Apr 17, 2025 at 09:48:39AM -0300, Daniel Henrique Barboza wrote: > The Linux kernel, up until at least version 6.12, has issues with a KVM > guest setting scounteren to 0 during reset. No error will be thrown > during boot, but trying to use rdtime in the guest (by executing 'ping' > for example) will result in a stack trace and an illegal instruction > error: > > / # ping 8.8.8.8 > PING 3.33.130.190 (8.8.8.8): 56 data bytes > [ 19.464484] ping[56]: unhandled signal 4 code 0x1 at 0x00007fffae0665f4 > [ 19.493332] CPU: 0 PID: 56 Comm: ping Not tainted 6.9.0-rc3-dbarboza #7 > [ 19.523249] Hardware name: riscv-virtio,qemu (DT) > [ 19.546641] epc : 00007fffae0665f4 ra : 00000000000c6316 sp : > 00007fffc7cfd9f0 > [ 19.576214] gp : 0000000000172408 tp : 00000000001767a0 t0 : > 0000000000000000 > [ 19.606719] t1 : 0000000000000020 t2 : 0000000000000000 s0 : > 00007fffc7cfda00 > [ 19.640938] s1 : 00007fffc7cfda30 a0 : 0000000000000001 a1 : > 00007fffc7cfda30 > [ 19.676698] a2 : 0000000000000000 a3 : 00000000000009ee a4 : > 00007fffae064000 > [ 19.721036] a5 : 0000000000000001 a6 : 0000000000000000 a7 : > 00000000001784d0 > [ 19.765061] s2 : 00000000001784d0 s3 : 000000000011ca38 s4 : > 000000000011d000 > [ 19.801985] s5 : 0000000000000001 s6 : 0000000000000000 s7 : > 0000000000000000 > [ 19.841235] s8 : 0000000000177788 s9 : 0000000000176828 s10: > 0000000000000000 > [ 19.882479] s11: 00000000001777a8 t3 : ffffffffffffffff t4 : > 0000000000172f60 > [ 19.923651] t5 : 0000000000000020 t6 : 000000000000000a > [ 19.950491] status: 0000000200004020 badaddr: 00000000c01027f3 cause: > 0000000000000002 > [ 19.995864] Code: 431c f613 0017 869b 0007 ea59 000f 0220 435c cfbd (27f3) > c010 > Illegal instruction > / # > > Reading the host scounteren val and using it during reset, instead of > zeroing it, will prevent this error. It is worth noting that scounteren > is a WARL reg according to the RISC-V ISA spec, and in theory the kernel > should accept a zero val and convert it to something that won't break > the guest.
0 is legal, so the kernel (KVM) shouldn't change what userspace selects. Userspace, which includes user policy knowledge, knows best and indeed 0 is the best selection when no other policy is provided. Changing from 0 to whatever KVM has put there is wrong. > > We can't tell from userspace if the host kernel handles scounteren = 0 > during reset accordingly, so to prevent this error we'll assume that it > won't. Read the reg from the host and use it as reset value. It's not the host kernel that needs to change how it handles things. It's the guest kernel. When the guest uses ping, which likely calls gtod, which uses rdtime, or if the guest uses anything that results in an rdtime use, then it'll hit this issue if the host doesn't support sscofpmf (which the QEMU rv64 cpu type doesn't). The reason is because KVM won't expose the SBI PMU without sscofpmf and then the guest kernel pmu driver won't run, and currently the only place scounteren is getting set up is in the pmu driver. The guest kernel should unconditionally set up scounteren to match what it expects userspace to use (like enable rdtime, since the guest kernel actually implements gtod in vdso with it) > > We'll end up repeating code from kvm_riscv_get_regs_csr() to do that. > Create a helper that will get a single CSR and use it in get_regs_csr() > and in kvm_riscv_reset_regs_csr() to avoid code duplication. > > Fixes: 4db19d5b21 ("target/riscv/kvm: add missing KVM CSRs") This isn't the right tag since that is already fixed by checking get-reg-list in a previous patch. This patch is trying to fix a guest kernel bug, which it shouldn't be doing, at least not without some user toggle allowing the workaround to be turned on/off. > Signed-off-by: Daniel Henrique Barboza <dbarb...@ventanamicro.com> > --- > target/riscv/kvm/kvm-cpu.c | 73 ++++++++++++++++++++++++++++---------- > 1 file changed, 55 insertions(+), 18 deletions(-) > > diff --git a/target/riscv/kvm/kvm-cpu.c b/target/riscv/kvm/kvm-cpu.c > index a91a87b175..918fe51257 100644 > --- a/target/riscv/kvm/kvm-cpu.c > +++ b/target/riscv/kvm/kvm-cpu.c > @@ -634,29 +634,40 @@ static int kvm_riscv_put_regs_core(CPUState *cs) > return ret; > } > > -static int kvm_riscv_get_regs_csr(CPUState *cs) > +static int kvm_riscv_get_reg_csr(CPUState *cs, > + KVMCPUConfig *csr_cfg) > { > RISCVCPU *cpu = RISCV_CPU(cs); > uint64_t reg; > - int i, ret; > + int ret; > > - for (i = 0; i < ARRAY_SIZE(kvm_csr_cfgs); i++) { > - KVMCPUConfig *csr_cfg = &kvm_csr_cfgs[i]; > + if (!csr_cfg->supported) { > + return 0; > + } > > - if (!csr_cfg->supported) { > - continue; > - } > + ret = kvm_get_one_reg(cs, csr_cfg->kvm_reg_id, ®); > + if (ret) { > + return ret; > + } > + > + if (csr_cfg->prop_size == sizeof(uint32_t)) { > + kvm_cpu_csr_set_u32(cpu, csr_cfg, reg); > + } else { > + kvm_cpu_csr_set_u64(cpu, csr_cfg, reg); > + } > + > + return 0; > +} > + > +static int kvm_riscv_get_regs_csr(CPUState *cs) > +{ > + int i, ret; > > - ret = kvm_get_one_reg(cs, csr_cfg->kvm_reg_id, ®); > + for (i = 0; i < ARRAY_SIZE(kvm_csr_cfgs); i++) { > + ret = kvm_riscv_get_reg_csr(cs, &kvm_csr_cfgs[i]); > if (ret) { > return ret; > } > - > - if (csr_cfg->prop_size == sizeof(uint32_t)) { > - kvm_cpu_csr_set_u32(cpu, csr_cfg, reg); > - } else { > - kvm_cpu_csr_set_u64(cpu, csr_cfg, reg); > - } > } > > return 0; > @@ -690,8 +701,11 @@ static int kvm_riscv_put_regs_csr(CPUState *cs) > return 0; > } > > -static void kvm_riscv_reset_regs_csr(CPURISCVState *env) > +static void kvm_riscv_reset_regs_csr(CPUState *cs, CPURISCVState *env) > { > + uint64_t scounteren_kvm_id = RISCV_CSR_REG(scounteren); > + int ret; > + > env->mstatus = 0; > env->mie = 0; > env->stvec = 0; > @@ -701,8 +715,30 @@ static void kvm_riscv_reset_regs_csr(CPURISCVState *env) > env->stval = 0; > env->mip = 0; > env->satp = 0; > - env->scounteren = 0; > env->senvcfg = 0; > + > + /* > + * Some kernels will take issue with env->scounteren = 0 > + * causing problems inside the KVM guest with 'rdtime'. > + * Read 'scounteren' from the host and use it. > + */ > + for (int i = 0; i < ARRAY_SIZE(kvm_csr_cfgs); i++) { > + KVMCPUConfig *csr_cfg = &kvm_csr_cfgs[i]; > + > + if (csr_cfg->kvm_reg_id != scounteren_kvm_id) { > + continue; > + } > + > + if (!csr_cfg->supported) { > + break; > + } > + > + ret = kvm_riscv_get_reg_csr(cs, &kvm_csr_cfgs[i]); > + if (ret) { > + error_report("Unable to retrieve scounteren from host ," > + "error %d", ret); > + } > + } > } > > static int kvm_riscv_get_regs_fp(CPUState *cs) > @@ -1711,16 +1747,17 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run > *run) > void kvm_riscv_reset_vcpu(RISCVCPU *cpu) > { > CPURISCVState *env = &cpu->env; > + CPUState *cs = CPU(cpu); > int i; > > for (i = 0; i < 32; i++) { > env->gpr[i] = 0; > } > env->pc = cpu->env.kernel_addr; > - env->gpr[10] = kvm_arch_vcpu_id(CPU(cpu)); /* a0 */ > + env->gpr[10] = kvm_arch_vcpu_id(cs); /* a0 */ > env->gpr[11] = cpu->env.fdt_addr; /* a1 */ > > - kvm_riscv_reset_regs_csr(env); > + kvm_riscv_reset_regs_csr(cs, env); > } > > void kvm_riscv_set_irq(RISCVCPU *cpu, int irq, int level) > -- > 2.49.0 > > I would just drop this patch and make the default 'virt' cpu type 'max', then nobody will hit the issue. We should also fix the [guest] kernel, which I'll try to do soon. Thanks, drew