Conor Dooley <co...@kernel.org> writes: > On Wed, Jan 24, 2024 at 01:49:51PM +0100, Björn Töpel wrote: >> Hi! >> >> I bumped the RISC-V Linux kernel CI to use qemu 8.2.0, and realized that >> thead c906 didn't boot anymore. Bisection points to commit d6a427e2c0b2 >> ("target/riscv/cpu.c: restrict 'marchid' value") >> >> Reverting that commit, or the hack below solves the boot issue: >> >> --8<-- >> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c >> index 8cbfc7e781ad..e18596c8a55a 100644 >> --- a/target/riscv/cpu.c >> +++ b/target/riscv/cpu.c >> @@ -505,6 +505,9 @@ static void rv64_thead_c906_cpu_init(Object *obj) >> cpu->cfg.ext_xtheadsync = true; >> >> cpu->cfg.mvendorid = THEAD_VENDOR_ID; >> + cpu->cfg.marchid = ((QEMU_VERSION_MAJOR << 16) | >> + (QEMU_VERSION_MINOR << 8) | >> + (QEMU_VERSION_MICRO)); >> #ifndef CONFIG_USER_ONLY >> set_satp_mode_max_supported(cpu, VM_1_10_SV39); >> #endif >> --8<-- >> >> I'm unsure what the correct qemu way of adding a default value is, >> or if c906 should have a proper marchid. > > The "correct" marchid/mimpid values for the c906 are zero.
Ok! Thanks for clearing that up for me. > I haven't looked into the code at all, so I am "assuming" that it is > being zero intialised at present. Linux applies the errata fixups for > the c906 when archid and impid are both zero - so your patch will avoid > these fixups being applied. I'm also assuming 0, -- will double-check. Hmm, that means that the *previous* marchid was incorrect (pre d6a427e2c0b2). > Do you think that perhaps the emulation in QEMU does not support what > the kernel uses once then errata fixups are enabled? Did a quick look at the c906 "in_asm,int" logs: | 0x80201040: 12000073 sfence.vma zero,zero | 0x80201044: 18051073 csrrw zero,satp,a0 | | riscv_cpu_do_interrupt: hart:0, async:0, cause:000000000000000c, epc:0x0000000080201048, tval:0x0000000080201048, desc=exec_page_fault | riscv_cpu_do_interrupt: hart:0, async:0, cause:000000000000000c, epc:0xffffffff80001048, tval:0xffffffff80001048, desc=exec_page_fault | ...cont forever So it looks like we're tripping over the page tables, when we're turning on paging. Hmm, maybe it's not qemu, but the c906 that has been broken for a while? I'll disable it temporarily from CI anyhow, and will continue digging. Thanks for the pointers/clarifications, Conor! Björn