I have a system running 8.0-PRERELEASE with multiple drives and SATA
port multipliers (siis controllers and PMPs). All of the attached
drives are labeled via glabel(8) and then included into a ZFS pool.
During some testing to determine how the system would react to a dead
drive (simulated by physically removing a drive during operation), I
was able to produce a panic.
Now, I know that the SATA PMP and siis(4) code to handle and recover
from device errors is incomplete, but I believe the crash may be
particular to using glabel'd drives. Basically, after removing a drive
while the zpool is in use and issues 'camcontrol reset' and 'rescan' on
the appropriate bus, the physical device associated with the drive
disappears. In this case:
(pass5:siisch7:0:15:0): lost device
(pass5:siisch7:0:15:0): removing device entry
(ada2:siisch7:0:0:0): lost device
and /dev/ada2 disappears. However, the associated glabel
/dev/label/bigdisk07 remains. Since my ZFS pool is created based on the
drive glabels, I believe this is why ZFS never notices the drives
disappear either.
Do glabels typically go away after a physical device is lost? Should
this not be the case?
After some runtime with the physical device missing, a kernel panic is
produced:
ada2:siisch7:0:0:0): Synchronize cache failed
(ada2:siisch7:0:0:0): removing device entry
Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 14
fault virtual address = 0x48
fault code = supervisor write data, page not present
instruction pointer = 0x20:0xffffffff8035f375
stack pointer = 0x28:0xffffff800006db60
frame pointer = 0x28:0xffffff800006db70
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 2 (g_event)
[thread pid 2 tid 100014 ]
Stopped at _mtx_lock_flags+0x15: lock cmpxchgq %rsi,0x18(%rdi)
db> bt
Tracing pid 2 tid 100014 td 0xffffff00014d4ab0
_mtx_lock_flags() at _mtx_lock_flags+0x15
vdev_geom_release() at vdev_geom_release+0x33
vdev_geom_orphan() at vdev_geom_orphan+0x15c
g_run_events() at g_run_events+0x104
g_event_procbody() at g_event_procbody+0x55
fork_exit() at fork_exit+0x118
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffff800006dd30, rbp = 0 ---
I'm open to try patches and other suggestions. Thanks.
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"