On Sun, Nov 25, 2018 at 11:35:30PM -0500, Garrett Wollman wrote: > <<On Mon, 19 Nov 2018 07:09:44 +0200, Konstantin Belousov > <kostik...@gmail.com> said: > > > On Sun, Nov 18, 2018 at 08:24:38PM -0500, Garrett Wollman wrote: > >> Has anyone seen this before? It's on a busy NFS server, but hasn't > >> been observed on any of our other NFS servers. > >> > >> ------------------------------------------------------------------------ > >> Fatal trap 12: page fault while in kernel mode > > >> --- trap 0xc, rip = 0xffffffff809a903d, rsp = 0xfffffe17eb8d0710, rbp = > >> 0xfffffe17eb8d0750 --- > >> vm_page_alloc_after() at vm_page_alloc_after+0x15d/frame 0xfffffe17eb8d0750 > > > What is the line number for vm_page_alloc_after+0x15d ? > > Do you have NUMA enabled on 11 ? > > If gdb is to be believed, the trap is at line 1687: > > /* > * At this point we had better have found a good page. > */ > KASSERT(m != NULL, ("missing page")); > free_count = vm_phys_freecnt_adj(m, -1); > >>>>>> if ((m->flags & PG_ZERO) != 0) > vm_page_zero_count--; > mtx_unlock(&vm_page_queue_free_mtx); > vm_page_alloc_check(m); > > The faulting instruction is: > > 0xffffffff809a903d <vm_page_alloc_after+349>: testb $0x8,0x5a(%r14) > > There are no options matching /numa/i in the configuration. (This is > a non-debugging configuration so the KASSERT is inoperative, I > assume.) I have about a dozen other servers with the same kernel and > they're not crashing, but obviously they all have different loads and > sets of active clients.
If you're using a Skylake, I suspect that you can set the hw.skz63_enable tunable to 0 as a workaround, assuming you're not using any code that relies on Intel TSX. (I don't think there's anything in the base system that does.) There are some details in https://reviews.freebsd.org/D18374 _______________________________________________ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"