https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211013
Bug ID: 211013 Summary: Write error to UFS filesystem with softupdates panics machine Product: Base System Version: 11.0-BETA1 Hardware: Any OS: Any Status: New Severity: Affects Many People Priority: --- Component: kern Assignee: freebsd-bugs@FreeBSD.org Reporter: k...@denninger.net The machine in question had mounted a UFS filesystem mounted that had softupdates enabled (on an SD card; I was updating a system that runs FreeBSD on a Raspberry Pi2 by plugging the card into a different machine) and the I/O card took an unrecoverable write error. The result was a kernel panic; this is apparently considered expected behavior at present if softupdates are turned on for the filesystem because it's possible that the filesystem has now been corrupted and there is no way to be sure with the machine running. Thus the choice to panic() when this situation occurs. But it appears that the choice to panic() is too broad and unnecessary in that in many cases a less-severe action is effective while not exposing the system to the risk of unknown filesystem corruption. Yes, if there are working-set pages on that volume and it is corrupt, the system is no longer stable (this is especially true if the system is *running* from that volume.) It is also true that in the case of a solid-state device of some kind the impact of a write error may cross a filesystem boundary, so it's insufficient to invalidate the filesystem (on a SSD or similar device the read/erase/write cycle for a data re-write may involve many megabytes of data, and that can possibly not be entirely local to the filesystem mounted if there is more than one on the physical volume.) HOWEVER, forcibly-detaching the volume in question instead of calling panic() *should* be effective in preventing the possibility of propagating a corrupted filesystem. While this will lead to a panic in the event that executing RSS (or consumed page file space) is present on that volume, in the case where the device holds only data the detach will *not* panic the machine. This appears to be a situation where a less-severe "remedy" for a failed I/O is certainly called for. The following backtrace was captured from the panic itself: root@Dbms2:/var/crash # kgdb /boot/kernel/kernel vmcore.0 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: panic: initiate_write_inodeblock_ufs2: already started cpuid = 14 KDB: stack backtrace: #0 0xffffffff80b1f357 at kdb_backtrace+0x67 #1 0xffffffff80ad6ec2 at vpanic+0x182 #2 0xffffffff80ad6d33 at panic+0x43 #3 0xffffffff80dc16ad at softdep_disk_io_initiation+0x159d #4 0xffffffff80de61eb at ffs_geom_strategy+0x13b #5 0xffffffff80b872f7 at bufwrite+0x267 #6 0xffffffff80b8ac6a at vfs_bio_awrite+0x3ca #7 0xffffffff80b96b77 at vop_stdfsync+0x277 #8 0xffffffff80983766 at devfs_fsync+0x26 #9 0xffffffff81101f7d at VOP_FSYNC_APV+0x8d #10 0xffffffff80baf1ae at sched_sync+0x3be #11 0xffffffff80a8dcb5 at fork_exit+0x85 #12 0xffffffff80f7f85e at fork_trampoline+0xe Uptime: 27m9s (kgdb) where #0 doadump (textdump=<value optimized out>) at pcpu.h:221 #1 0xffffffff80ad6949 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:366 #2 0xffffffff80ad6efb in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:759 #3 0xffffffff80ad6d33 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:690 #4 0xffffffff80dc16ad in softdep_disk_io_initiation (bp=<value optimized out>) at /usr/src/sys/ufs/ffs/ffs_softdep.c:10301 #5 0xffffffff80de61eb in ffs_geom_strategy (bo=<value optimized out>, bp=<value optimized out>) at buf.h:412 #6 0xffffffff80b872f7 in bufwrite (bp=0xfffffe02e8629b30) at buf.h:405 #7 0xffffffff80b8ac6a in vfs_bio_awrite (bp=<value optimized out>) at buf.h:393 #8 0xffffffff80b96b77 in vop_stdfsync (ap=0xfffffe034f481b68) at /usr/src/sys/kern/vfs_default.c:692 #9 0xffffffff80983766 in devfs_fsync (ap=0xfffffe034f481b68) at /usr/src/sys/fs/devfs/devfs_vnops.c:702 #10 0xffffffff81101f7d in VOP_FSYNC_APV (vop=<value optimized out>, a=<value optimized out>) at vnode_if.c:1331 #11 0xffffffff80baf1ae in sched_sync () at vnode_if.h:549 #12 0xffffffff80a8dcb5 in fork_exit (callout=0xffffffff80baedf0 <sched_sync>, arg=0x0, frame=0xfffffe034f481c00) at /usr/src/sys/kern/kern_fork.c:1038 #13 0xffffffff80f7f85e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:611 #14 0x0000000000000000 in ?? () (kgdb) FreeBSD 11.0-BETA1 #0 r302439: Fri Jul 8 14:37:27 CDT 2016 k...@dbms2.denninger.net:/usr/obj/usr/src/sys/GENERIC The offending code line: static void initiate_write_inodeblock_ufs2(inodedep, bp) struct inodedep *inodedep; struct buf *bp; /* The inode block */ { struct allocdirect *adp, *lastadp; struct ufs2_dinode *dp; struct ufs2_dinode *sip; struct inoref *inoref; struct ufsmount *ump; struct fs *fs; ufs_lbn_t i; #ifdef INVARIANTS ufs_lbn_t prevlbn = 0; #endif int deplist; if (inodedep->id_state & IOSTARTED) panic("initiate_write_inodeblock_ufs2: already started"); inodedep->id_state |= IOSTARTED; -- End capture -- You are receiving this mail because: You are the assignee for the bug. _______________________________________________ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"