> Date: Fri, 11 Apr 2025 08:22:59 +0000 > From: Emmanuel Dreyfus <m...@netbsd.org> > > I tried to boot NetBSD-current (i386 and amd6), BIOS and UEFI modes) on a > Dell server, it always panic on a "fatal page fault in supervisor mode" > early during the boot. The backtrace says we went there through > main/fork1/lwp_create/memset > > A screenshot is available there: > https://dl.espci.fr/ticket/2a34048a0a9749c68cfb0dc380cc2259 > > Any idea how to debug that? I expected NetBSD to lack some hardware > support on this recent machine, not to be unable to reach single user > mode.
This looks like the same issue as: PR kern/57661: Crash when booting on Xeon Silver 4416+ in KVM/Qemu https://gnats.netbsd.org/57661 Some additional diagnostics from the PR: > The stack trace is a bit of a red herring, I traced down the memset to line > 343 of src/sys/arch/x86/x86/fpu.c, so we actually have: > lwp_create() -> uvm_lwp_fork() -> cpu_lwp_fork() -> fpu_lwp_fork() -> memset() > > Adding > printf("DEBUG: sizeof(pcb2->pcb_savefpu)==%ld\n", > sizeof(pcb2->pcb_savefpu)); > printf("DEBUG: x86_fpu_save_size==%d\n", x86_fpu_save_size); > > before the memset() call prints > > [ 1.8432366] DEBUG: sizeof(pcb2->pcb_savefpu)==576 > [ 1.8432366] DEBUG: x86_fpu_save_size==11008 > > Changing the VM's CPU to Sandy Bridge prints > > [ 1.8897648] DEBUG: sizeof(pcb2->pcb_savefpu)==576 > [ 1.8897648] DEBUG: x86_fpu_save_size==832 So, the x86_fpu_save_size is larger than we can handle and we don't report the problem gracefully either -- my initial attempt to put a more obvious panic in x86/identcpu.c when we initially determine x86_fpu_save_size hit the problem that it happens too early and sends the machine into a reboot loop. Unfortunately, simply expanding the space for pcb_savefpu is not trivial: 80 /* 81 * IMPORTANT NOTE: this structure, including the variable-sized FPU state at 82 * the end, must fit within one page. 83 */ 84 struct pcb { ... 103 union savefpu pcb_savefpu __aligned(64); /* floating point state */ 104 /* **** DO NOT ADD ANYTHING HERE **** */ 105 }; https://nxr.netbsd.org/xref/src/sys/arch/amd64/include/pcb.h?r=1.32#80 The top comment came from this commit: > Module Name: src > Committed By: maxv > Date: Tue Mar 17 17:18:49 UTC 2020 > > Modified Files: > src/sys/arch/amd64/include: cpu.h param.h pcb.h types.h > src/sys/arch/x86/x86: vm_machdep.c > > Log Message: > Add a redzone between the pcb and the stack. Sent to port-amd64@. https://mail-index.netbsd.org/source-changes/2020/03/17/msg115178.html So, the immediate issue is that the memset in fpu_lwp_fork is hitting the guard page. Perhaps resolving that is simply a matter of teaching cpu_uarea_alloc to advance by round_page(offsetof(struct pcb, pcb_savefpu) + x86_fpu_save_size) rather than by PAGE_SIZE when deciding where to set the guard page: 364 void * 365 cpu_uarea_alloc(bool system) 366 { 367 vaddr_t base, va; 368 paddr_t pa; 369 370 base = uvm_km_alloc(kernel_map, USPACE + PAGE_SIZE, 0, 371 UVM_KMF_WIRED|UVM_KMF_WAITVA); 372 373 /* Page[1] = RedZone */ 374 va = base + PAGE_SIZE; 375 if (!pmap_extract(pmap_kernel(), va, &pa)) { 376 panic("%s: impossible, Page[1] unmapped", __func__); 377 } 378 pmap_kremove(va, PAGE_SIZE); 379 uvm_pagefree(PHYS_TO_VM_PAGE(pa)); https://nxr.netbsd.org/xref/src/sys/arch/x86/x86/vm_machdep.c?r=1.46#352 But the bottom comment, /* **** DO NOT ADD ANYTHING HERE **** */, is much older: > Module Name: src > Committed By: dsl > Date: Thu Feb 20 18:19:10 UTC 2014 > > Modified Files: > src/sys/arch/amd64/amd64: genassym.cf machdep.c > src/sys/arch/amd64/include: pcb.h proc.h > src/sys/arch/i386/i386: locore.S machdep.c > src/sys/arch/i386/include: pcb.h proc.h > src/sys/arch/x86/x86: vm_machdep.c > src/sys/sys: param.h > > Log Message: > Move the amd64 and i386 pcb to the bottom of the uarea, and move the > kernel stack to the top. > Change the pcb layouts so that fpu save area is at the end and is > 64byte aligned ready for xsave (saving the ymm registers). > Welcome to 6.99.32 https://mail-index.netbsd.org/source-changes/2014/02/20/msg051841.html It's not obvious to me whether the comment affects anything at hand. Perhaps it's just admonishing future developers not to add things _other than fpu state_ because the fpu state is actually variable size and may extend past the size of a union savefpu object.