https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=238817
Bug ID: 238817 Summary: g_raid3_access race on destruction Product: Base System Version: CURRENT Hardware: Any OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: b...@freebsd.org Reporter: rli...@freebsd.org CC: ma...@freebsd.org It seems like the g_raid3_softc can be destroyed in between g_topology_unlock() and sx_xlock(&sc->sc_lock) in g_raid3_access(). There may need to be some kind of flag on the softc protected by the topology lock to indicate an in-progress topology to softc relocking, to prevent destruction in the meantime (perhaps by checking for it in g_raid3_can_destroy()). I am not sure if this pattern may affect other sites where this is done too, like g_raid3_destroy_geom(). I am also not sure if it may affect other geom classes, like gmirror, or if they might not suffer from this problem due to different conditions under which their softcs are destroyed. With the same set up as in bug 238814: sysctl kern.geom.raid3.debug=4 sysctl debug.fail_point.mnowait=1%return while true; do kyua test -k /usr/tests/sys/geom/class/raid3/Kyuafile; done [...] GEOM_RAID3[0]: Request failed (error=28). md1[WRITE(offset=1024, length=1024)] GEOM_RAID3[4]: g_raid3_event_send: Sending event 0xfffff800163e2a80. GEOM_RAID3[4]: g_raid3_event_send: Waking up 0xfffff800048bd400. GEOM_RAID3[0]: Request failed (error=28). md2[WRITE(offset=1024, length=1024)] GEOM_RAID3[4]: g_raid3_event_send: Sending event 0xfffff800163e2ac0. GEOM_RAID3[4]: g_raid3_event_send: Waking up 0xfffff800048bd400. GEOM_RAID3[0]: Request failed. raid3/graid3.rlWY7w[WRITE(offset=2048, length=2048)] GEOM_RAID3[3]: Running event for disk md1. GEOM_RAID3[3]: Changing disk md1 state from ACTIVE to DISCONNECTED. GEOM_RAID3[1]: Disk md1 state changed from ACTIVE to DISCONNECTED (device graid3.rlWY7w). GEOM_RAID3[0]: Device graid3.rlWY7w: provider md1 disconnected. GEOM_RAID3[2]: Access request for raid3/graid3.rlWY7w: r-1w-1e0. GEOM_RAID3[2]: Consumer md1 destroyed. GEOM_RAID3[2]: Access md1 r-1w-1e-1 = 0 GEOM_RAID3[1]: Device graid3.rlWY7w: genid bumped to 1. GEOM_RAID3[2]: Metadata on md0 updated. GEOM_RAID3[2]: Metadata on md2 updated. GEOM_RAID3[1]: Device graid3.rlWY7w state changed from COMPLETE to DEGRADED. GEOM_RAID3[3]: Running event for disk md2. GEOM_RAID3[3]: Changing disk md2 state from ACTIVE to DISCONNECTED. GEOM_RAID3[1]: Disk md2 state changed from ACTIVE to DISCONNECTED (device graid3.rlWY7w). GEOM_RAID3[0]: Device graid3.rlWY7w: provider md2 disconnected. GEOM_RAID3[1]: Consumer md1 destroyed. GEOM_RAID3[2]: Consumer md2 destroyed. GEOM_RAID3[2]: Access md2 r-1w-1e-1 = 0 GEOM_RAID3[0]: Device graid3.rlWY7w: provider raid3/graid3.rlWY7w destroyed. GEOM_RAID3[2]: No I/O requests for graid3.rlWY7w, it can be destroyed. GEOM_RAID3[2]: Metadata on md0 updated. GEOM_RAID3[2]: Consumer md0 destroyed. GEOM_RAID3[2]: Access md0 r-1w-1e-1 = 0 GEOM_RAID3[0]: Device graid3.rlWY7w destroyed. GEOM_RAID3[1]: Thread exiting. Fatal trap 9: general protection fault while in kernel mode cpuid = 2; apic id = 02 instruction pointer = 0x20:0xffffffff80ba77b4 stack pointer = 0x28:0xfffffe00512813b0 frame pointer = 0x28:0xfffffe0051281450 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 1137 (dd) trap number = 9 panic: general protection fault cpuid = 2 time = 1561520628 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00512810c0 vpanic() at vpanic+0x19d/frame 0xfffffe0051281110 panic() at panic+0x43/frame 0xfffffe0051281170 trap_fatal() at trap_fatal+0x39c/frame 0xfffffe00512811d0 trap() at trap+0x6c/frame 0xfffffe00512812e0 calltrap() at calltrap+0x8/frame 0xfffffe00512812e0 --- trap 0x9, rip = 0xffffffff80ba77b4, rsp = 0xfffffe00512813b0, rbp = 0xfffffe0051281450 --- _sx_xlock_hard() at _sx_xlock_hard+0x274/frame 0xfffffe0051281450 _sx_xlock() at _sx_xlock+0xc1/frame 0xfffffe0051281490 g_raid3_access() at g_raid3_access+0x11c/frame 0xfffffe00512814e0 g_access() at g_access+0x28e/frame 0xfffffe0051281550 g_dev_close() at g_dev_close+0x158/frame 0xfffffe00512815a0 devfs_close() at devfs_close+0x2e4/frame 0xfffffe0051281610 VOP_CLOSE_APV() at VOP_CLOSE_APV+0x60/frame 0xfffffe0051281630 vn_close1() at vn_close1+0xe3/frame 0xfffffe00512816a0 vn_closefile() at vn_closefile+0x4c/frame 0xfffffe0051281720 devfs_close_f() at devfs_close_f+0x2c/frame 0xfffffe0051281750 _fdrop() at _fdrop+0x1a/frame 0xfffffe0051281770 closef() at closef+0x1ec/frame 0xfffffe0051281800 fdescfree_fds() at fdescfree_fds+0x8c/frame 0xfffffe0051281850 fdescfree() at fdescfree+0x37a/frame 0xfffffe0051281910 exit1() at exit1+0x4fe/frame 0xfffffe0051281980 sys_sys_exit() at sys_sys_exit+0xd/frame 0xfffffe0051281990 amd64_syscall() at amd64_syscall+0x276/frame 0xfffffe0051281ab0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0051281ab0 --- syscall (1, FreeBSD ELF64, sys_sys_exit), rip = 0x8003c892a, rsp = 0x7fffffffd908, rbp = 0x7fffffffd920 --- KDB: enter: panic [ thread pid 1137 tid 100201 ] Stopped at kdb_enter+0x3b: movq $0,kdb_why db> x/x ticks ticks: 7fff7475 db> x/x g_udnf_last_ticks g_udnf_last_ticks: 7fff7472 db> x/s g_udnf_last_name g_udnf_last_name: md2 db> x/d g_udnf_last_tid g_udnf_last_tid: 100193 db> x/aS g_udnf_last_stack+0x8,0x12 g_udnf_last_stack+0x8: uma_dbg_nowait_fail_record+0x31 g_udnf_last_stack+0x10: zalloc_inject_failure+0x4c g_udnf_last_stack+0x18: uma_zalloc_arg+0xa98 g_udnf_last_stack+0x20: mdstart_malloc+0x81d g_udnf_last_stack+0x28: md_kthread+0x20c g_udnf_last_stack+0x30: fork_exit+0x84 g_udnf_last_stack+0x38: fork_trampoline+0xe g_udnf_last_stack+0x40: 0 g_udnf_last_stack+0x48: 0 g_udnf_last_stack+0x50: 0 g_udnf_last_stack+0x58: 0 g_udnf_last_stack+0x60: 0 g_udnf_last_stack+0x68: 0 g_udnf_last_stack+0x70: 0 g_udnf_last_stack+0x78: 0 g_udnf_last_stack+0x80: 0 g_udnf_last_stack+0x88: 0 g_udnf_last_stack+0x90: 0 db> x/s version version: FreeBSD 13.0-CURRENT #42 r349025+3bdd0fc24f5b(mnowait-dbg)-dirty: Tue Jun 25 20:34:27 PDT 2019\012 r...@vali.kishkinda.net:/usr/obj/usr/src/freebsd/amd64.amd64/sys/GENERIC\012 -- You are receiving this mail because: You are the assignee for the bug. _______________________________________________ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"