On Thu, Feb 4, 2021 at 4:27 PM Mark Johnston <ma...@freebsd.org> wrote:
> On Fri, Feb 05, 2021 at 12:58:34AM +0200, Konstantin Belousov wrote: > > On Thu, Feb 04, 2021 at 01:34:13PM -0800, Matthew Macy wrote: > > > On Thu, Feb 4, 2021 at 1:31 PM Alan Somers <asom...@freebsd.org> > wrote: > > > > > > > > After upgrading a machine to FreeBSD, 12.2, it hit the following > panic on > > > > its first reboot. I suspect that a few other servers have hit this > too, > > > > but since it happens before swap is mounted there are no core dumps, > and > > > > they usually reboot immediately. The code in question hasn't > changed since > > > > 2018. The panic happened in cmci_monitor at line 930. Does anybody > have > > > > any suggestions for how I could debug further? I can't readily > reproduce > > > > it, and I can't dump core, but I'd like to investigate it any way I > can. > > > > The server in question has dual Xeon Gold 6142 CPUs. > > > > > > Try this. > > > > I think that there is no other dependencies in the startup order, but > > cannot know it for sure. > > > > commit 19584e3d3e9606d591fa30999b370ed758960e8c > > Author: Konstantin Belousov <k...@freebsd.org> > > Date: Fri Feb 5 00:56:09 2021 +0200 > > > > x86: init mca before APs are started > > APs only call mca_init() after they have been released by the BSP > though, and that happens later in SI_SUB_SMP. > > > diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c > > index 03100e77d455..e2bf2673cf69 100644 > > --- a/sys/x86/x86/mca.c > > +++ b/sys/x86/x86/mca.c > > @@ -1371,7 +1371,7 @@ mca_init_bsp(void *arg __unused) > > > > mca_init(); > > } > > -SYSINIT(mca_init_bsp, SI_SUB_CPU, SI_ORDER_ANY, mca_init_bsp, NULL); > > +SYSINIT(mca_init_bsp, SI_SUB_CPU, SI_ORDER_SECOND, mca_init_bsp, NULL); > > > > /* Called when a machine check exception fires. */ > > void > kib's patch causes a different problem, and this one is reproducible: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x18 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff8125762c stack pointer = 0x28:0xffffffff828dad90 frame pointer = 0x28:0xffffffff828dad90 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 0 () trap number = 12 panic: page fault cpuid = 0 time = 1 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff828daa50 vpanic() at vpanic+0x17b/frame 0xffffffff828daaa0 panic() at panic+0x43/frame 0xffffffff828dab00 trap_fatal() at trap_fatal+0x391/frame 0xffffffff828dab60 trap_pfault() at trap_pfault+0x4f/frame 0xffffffff828dabb0 trap() at trap+0x286/frame 0xffffffff828dacc0 calltrap() at calltrap+0x8/frame 0xffffffff828dacc0 --- trap 0xc, rip = 0xffffffff8125762c, rsp = 0xffffffff828dad90, rbp = 0xffffffff828dad90 --- native_lapic_enable_cmc() at native_lapic_enable_cmc+0x1c/frame 0xffffffff828dad90 _mca_init() at _mca_init+0x94c/frame 0xffffffff828dadd0 mi_startup() at mi_startup+0xdf/frame 0xffffffff828dadf0 btext() at btext+0x2c KDB: enter: panic [ thread pid 0 tid 0 ] Stopped at kdb_enter+0x37: movq $0,0x12bc396(%rip) If you're wondering, the panic happens at this point in native_lapic_enable_cmc: apic_id = PCPU_GET(apic_id); KASSERT(lapics[apic_id].la_present, ("%s: missing APIC %u", __func__, apic_id)); lapics[apic_id].la_lvts[APIC_LVT_CMCI].lvt_masked = 0; <- panic here lapics[apic_id].la_lvts[APIC_LVT_CMCI].lvt_active = 1; if (bootverbose) printf("lapic%u: CMCI unmasked\n", apic_id); } _______________________________________________ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"