On Wed, Sep 19, 2018 at 02:11:56PM -0700, Steve Kargl wrote: > On Wed, Sep 19, 2018 at 05:02:11PM -0400, Mark Johnston wrote: > > On Wed, Sep 19, 2018 at 01:01:52PM -0700, Steve Kargl wrote: > > > I have the kernel and core file if more information is needed. > > > > > > % cat info.2 > > > Dump header from device: /dev/ada0p3 > > Architecture: amd64 > > > Architecture Version: 2 > > > Dump Length: 2348281856 > > > Blocksize: 512 > > > Compression: none > > > Dumptime: Wed Sep 19 12:29:59 2018 > > > Hostname: troutmask.apl.washington.edu > > > Magic: FreeBSD Kernel Dump > > > Version String: FreeBSD 12.0-ALPHA4 #0 r338505: Thu Sep 6 13:45:34 PDT > > > 2018 > > > > > > ka...@troutmask.apl.washington.edu:/usr/obj/usr/src/amd64.amd64/sys/SPEW > > > Panic String: page fault > > > Dump Parity: 2676008548 > > > Bounds: 2 > > > Dump Status: good > > > > > > % more core.txt.2 > > > Fatal trap 12: page fault while in kernel mode > > > cpuid = 1; apic id = 11 > > > fault virtual address = 0xffffb8000719a428 > > > > This seems to be the result of a bit-flip. cred is 0xffffb8000719a400, > > which is almost but not quite in the direct map. In particular we have: > > > > (kgdb) frame 10 > > > > #10 0xffffffff8083e07d in vm_object_destroy (object=<optimized out>) at > > /usr/src/sys/vm/vm_object.c:703 > > 703 swap_release_by_cred(object->charge, object->cred); > > > > (kgdb) p object > > $8 = <optimized out> > > > > (kgdb) p *(vm_object_t)$r13 > > > > $9 = { > > ... > > cred = 0xffffb8000719a400, > > charge = 28672, > > umtx_data = 0x0 > > } > > (kgdb) p *(struct ucred *)0xfffff8000719a400 > > $10 = { > > cr_ref = 5737, > > cr_uid = 1001, > > cr_ruid = 1001, > > cr_svuid = 1001, > > cr_ngroups = 7, > > cr_rgid = 1001, > > cr_svgid = 1001, > > cr_uidinfo = 0xfffff80007285500, > > cr_ruidinfo = 0xfffff80007285500, > > cr_prison = 0xffffffff80a9de10 <prison0>, > > ... <more sane-looking ucred fields> > > > > That is, flipping one of the bits in the fault address leads me to a > > valid ucred. This could in principle be the result of a software bug, > > but I'd be more inclined to suspect the hardware. > > Mark, > > Thanks for looking into the problem. This system has > been running for probably 2 years or so without issues. > I guess it's time to pull out memtest86+ (or similar) > to see if hardware is starting to fail.
I'm not sure whether you're using ECC RAM, but if not, the system is susceptible to silent random bit flips. _______________________________________________ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"