Dear Rainer, if no one else is experiencing these errors, then it is probably a hardware error. I had once encountered problems with incompatible memory (combination of memory (ok with other boards) and board (ok with other memory) did not work properly). The issue could be triggered with a simple buildworld within minutes. However memtest86 could run for hours without finding a thing. Please don‘t rely on memtest86 if it does not report any problems.
To be sure: check / adjust memory timings, remove and/or replace memory modules and try to trigger the error again. Best regards, Holger Kipp > On 3. Oct 2018, at 15:01, "rai...@ultra-secure.de" <rai...@ultra-secure.de> > wrote: > > Hi, > > I created a PR for this, but maybe somebody here can help. > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231296 > > > I have a HP DL380 Gen10 server with a smartpqi(4) HBA and some disks > > smartpqi0: <E208i-p SR Gen10> port 0x4000-0x40ff mem 0xe2800000-0xe2807fff at > device 0.0 numa-domain 0 on pci4 > smartpqi0: using MSI-X interrupts (16 vectors) > smartpqi1: <P408i-a SR Gen10> port 0xc000-0xc0ff mem 0xf3800000-0xf3807fff at > device 0.0 numa-domain 0 on pci9 > smartpqi1: using MSI-X interrupts (16 vectors) > ses0 at smartpqi0 bus 0 scbus0 target 187 lun 0 > ses1 at smartpqi1 bus 0 scbus1 target 187 lun 0 > da2 at smartpqi1 bus 0 scbus1 target 64 lun 0 > da7 at smartpqi1 bus 0 scbus1 target 69 lun 0 > da5 at smartpqi1 bus 0 scbus1 target 67 lun 0 > da3 at smartpqi1 bus 0 scbus1 target 65 lun 0 > da8 at smartpqi1 bus 0 scbus1 target 70 lun 0 > da4 at smartpqi1 bus 0 scbus1 target 66 lun 0 > da9 at smartpqi1 bus 0 scbus1 target 71 lun 0 > pass3 at smartpqi0 bus 0 scbus0 target 1088 lun 0 > da0 at smartpqi0 bus 0 scbus0 target 64 lun 0 > da6 at smartpqi1 bus 0 scbus1 target 68 lun 0 > pass13 at smartpqi1 bus 0 scbus1 target 1088 lun 0 > da1 at smartpqi0 bus 0 scbus0 target 66 lun 0 > > > This server can be made to panic relatively easily by rsyncing packed > logfiles over to it and unpacking them. > > This is (hopefully) a backtrace of a crashdump resulting from one of those > panics: > > > Fatal trap 12: page fault while in kernel mode > cpuid = 3; apic id = 03 > fault virtual address = 0x5a > fault code = supervisor read data, page not present > instruction pointer = 0x20:0xffffffff80dff90d > stack pointer = 0x28:0xfffffe084ed93f00 > frame pointer = 0x28:0xfffffe084ed93f40 > code segment = base rx0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 0 (zio_write_issue_10) > trap number = 12 > panic: page fault > cpuid = 3 > KDB: stack backtrace: > #0 0xffffffff80b3d567 at kdb_backtrace+0x67 > #1 0xffffffff80af6b07 at vpanic+0x177 > #2 0xffffffff80af6983 at panic+0x43 > #3 0xffffffff80f77fcf at trap_fatal+0x35f > #4 0xffffffff80f78029 at trap_pfault+0x49 > #5 0xffffffff80f777f7 at trap+0x2c7 > #6 0xffffffff80f57dac at calltrap+0x8 > #7 0xffffffff80dee7e2 at kmem_back+0xf2 > #8 0xffffffff80dee6c0 at kmem_malloc+0x60 > #9 0xffffffff80de6172 at keg_alloc_slab+0xe2 > #10 0xffffffff80de8b7e at keg_fetch_slab+0x14e > #11 0xffffffff80de83b4 at zone_fetch_slab+0x64 > #12 0xffffffff80de848f at zone_import+0x3f > #13 0xffffffff80de4b99 at uma_zalloc_arg+0x3d9 > #14 0xffffffff82351ab2 at zio_write_compress+0x1e2 > #15 0xffffffff8235074c at zio_execute+0xac > #16 0xffffffff80b4ed74 at taskqueue_run_locked+0x154 > #17 0xffffffff80b4fed8 at taskqueue_thread_loop+0x98 > Uptime: 40m34s > Dumping 5489 out of 32379 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% > > Reading symbols from /boot/kernel/geom_mirror.ko...Reading symbols from > /usr/lib/debug//boot/kernel/geom_mirror.ko.debug...done. > done. > Loaded symbols for /boot/kernel/geom_mirror.ko > Reading symbols from /boot/kernel/zfs.ko...Reading symbols from > /usr/lib/debug//boot/kernel/zfs.ko.debug...done. > done. > Loaded symbols for /boot/kernel/zfs.ko > Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from > /usr/lib/debug//boot/kernel/opensolaris.ko.debug...done. > done. > Loaded symbols for /boot/kernel/opensolaris.ko > Reading symbols from /boot/kernel/accf_data.ko...Reading symbols from > /usr/lib/debug//boot/kernel/accf_data.ko.debug...done. > done. > Loaded symbols for /boot/kernel/accf_data.ko > Reading symbols from /boot/kernel/accf_http.ko...Reading symbols from > /usr/lib/debug//boot/kernel/accf_http.ko.debug...done. > done. > Loaded symbols for /boot/kernel/accf_http.ko > Reading symbols from /boot/kernel/cc_htcp.ko...Reading symbols from > /usr/lib/debug//boot/kernel/cc_htcp.ko.debug...done. > done. > Loaded symbols for /boot/kernel/cc_htcp.ko > Reading symbols from /boot/kernel/ums.ko...Reading symbols from > /usr/lib/debug//boot/kernel/ums.ko.debug...done. > done. > Loaded symbols for /boot/kernel/ums.ko > Reading symbols from /boot/kernel/tmpfs.ko...Reading symbols from > /usr/lib/debug//boot/kernel/tmpfs.ko.debug...done. > done. > Loaded symbols for /boot/kernel/tmpfs.ko > #0 0xffffffff80af68fb in doadump (textdump=0) at > /usr/src/sys/kern/kern_shutdown.c:309 > 309 if (dumping) > (kgdb) bt > #0 0xffffffff80af68fb in doadump (textdump=0) at > /usr/src/sys/kern/kern_shutdown.c:309 > #1 0xffffffff80af6925 in doadump (textdump=<value optimized out>) at > /usr/src/sys/kern/kern_shutdown.c:315 > #2 0xffffffff80af671b in kern_reboot (howto=260) at > /usr/src/sys/kern/kern_shutdown.c:382 > #3 0xffffffff80af6b41 in vpanic (fmt=<value optimized out>, > ap=0xfffffe084ed93c50) at /usr/src/sys/kern/kern_shutdown.c:769 > #4 0xffffffff80af6983 in panic (fmt=0x0) at > /usr/src/sys/kern/kern_shutdown.c:706 > #5 0xffffffff80f77fcf in trap_fatal (frame=0xfffffe084ed93e40, eva=90) at > /usr/src/sys/amd64/amd64/trap.c:875 > #6 0xffffffff80f78029 in trap_pfault (frame=0xfffffe084ed93e40, usermode=0) > at /usr/src/sys/amd64/amd64/trap.c:712 > #7 0xffffffff80f777f7 in trap (frame=0xfffffe084ed93e40) at > /usr/src/sys/amd64/amd64/trap.c:514 > #8 0xffffffff80f57dac in Xtss_pti () at > /usr/src/sys/amd64/amd64/exception.S:159 > #9 0xffffffff80dff90d in vm_page_rename (m=0x3ff, > new_object=0xfffff80018d8d000, new_pindex=<value optimized out>) at > /usr/src/sys/vm/vm_page.c:1342 > #10 0xffffffff80dee7e2 in kmem_suballoc (parent=0x262, min=0x14000, > max=0xffffffff81ebc558, size=874980, superpage_align=<value optimized out>) > at /usr/src/sys/vm/vm_kern.c:290 > #11 0xffffffff80dee6c0 in kmem_alloc_contig (vmem=0xfffffe00d59d0000, > size=18446744071594296576, flags=<value optimized out>, > low=18446735303990395200, high=257, alignment=18446735278033391616, > boundary=18446735278033391616, memattr=-16 '�') at > /usr/src/sys/vm/vm_kern.c:254 > #12 0xffffffff80de6172 in uma_prealloc (zone=0x0, items=1322860228) at > /usr/src/sys/vm/uma_core.c:3150 > #13 0xfffff806240140f0 in ?? () > #14 0xfffffe00c51f357e in ?? () > #15 0xfffffe00d59b0000 in ?? () > #16 0xfffff8000d460498 in ?? () > #17 0xfffff80624014140 in ?? () > #18 0x02fffe00c520c000 in ?? () > #19 0xfffff8000d460480 in ?? () > #20 0xfffff8000d4641c0 in ?? () > #21 0x0000000000000000 in ?? () > Current language: auto; currently minimal > > > As I said in the PR, I've had memtest86 running for 8h with no reported > problem. So I think I can rule out memory problems. > > I don't really have any experience debugging panics because in the last > 20-odd years of running FreeBSD, there rarely were any... > > > > Best Regards > Rainer > > > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" _______________________________________________ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"