On Thu, Feb 4, 2021 at 7:40 PM Konstantin Belousov <kostik...@gmail.com> wrote:
> On Thu, Feb 04, 2021 at 07:01:30PM -0700, Alan Somers wrote: > > On Thu, Feb 4, 2021 at 5:59 PM Konstantin Belousov <kostik...@gmail.com> > > wrote: > > > Do you have INVARIANTS enabled? If not, I am curious if enabling them > > > would convert that rare page fault into rare "CPU %d has more MC banks" > > > assert. > > > > > > Also might be the output of the > > > # for x in $(jot $(sysctl -n hw.ncpu) 0) ; do cpucontrol -m 0x179 > > > /dev/cpuctl$x; done > > > command will show the issue (0x179 is the MCG_CAP MSR). > > > You need to load cpuctl(4) if it is not loaded yet. > > > > > > > I don't have INVARIANTS enabled, and I can't enable it on the production > > servers. However, I can turn those three KASSERTs into VERIFYs and see > > what happens. Here is what your command shows on the server that > panicked: > > $ for x in $(jot $(sysctl -n hw.ncpu) 0) ; do sudo cpucontrol -m 0x179 > > /dev/cpuctl$x; done | uniq -c > > 16 MSR 0x179: 0x00000000 0x0f000c14 > > 16 MSR 0x179: 0x00000000 0x0f000814 > > It probably explains it, but it would be more telling if you left the > output as is, so that we can see which CPUs have MCG_CMCI_P (10) bit set. > I didn't sort them, so the first 16 have bit 10 set and the second 16 don't. > > I suspect that your machine has two sockets, and processor in one socket > has CPUs reporting MCG_CMCI_P, while other processor does not. Your SMP > is not quite symmetric, perhaps processors were from different bins? > Could be. Is there some MSR that reports a more specific version number? > > If BSP is selected on reporting socket, everything boots well. If > other socket wins the BSP selection race, cmci is not initialized, but > when per-cpu mca_init() sees CMCI_P bit, it calls cmci_setup() without > allocated cmc state, because BSP did not needed it. > > If I am right, then unconditionally allocating the memory is probably the > only choice there. > > commit 2e2c925ac3b626edc6492a57a80f6b87895801c2 > Author: Konstantin Belousov <k...@freebsd.org> > Date: Fri Feb 5 04:32:05 2021 +0200 > > x86 mca: unconditionally allocate memory for cmc state > > diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c > index 03100e77d455..dff3f7631f5c 100644 > --- a/sys/x86/x86/mca.c > +++ b/sys/x86/x86/mca.c > @@ -1047,7 +1047,7 @@ mca_setup(uint64_t mcg_cap) > "force_scan", CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_MPSAFE, NULL, > 0, > sysctl_mca_scan, "I", "Force an immediate scan for machine > checks"); > #ifdef DEV_APIC > - if (cmci_supported(mcg_cap)) > + if (cpu_vendor_id == CPU_VENDOR_INTEL) > cmci_setup(); > else if (amd_thresholding_supported()) > amd_thresholding_setup(); > _______________________________________________ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"