>Number: 148368 >Category: misc >Synopsis: ZFS hanging forever on 8.1-PRERELEASE >Confidential: no >Severity: serious >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sun Jul 04 23:10:04 UTC 2010 >Closed-Date: >Last-Modified: >Originator: Rich Ercolani >Release: RELENG_8 from June 15th >Organization: JHU ACM >Environment: FreeBSD manticore.acm.jhu.edu 8.1-PRERELEASE FreeBSD 8.1-PRERELEASE #0: Wed Jun 16 17:10:42 UTC 2010 r...@[removed]:/usr/obj/usr/local/ncvs/src/sys/DTRACE amd64
>Description: Occasionally, much to our chagrin, drives malfunction. When this happens, ZFS and company appear to "handle" the errors correctly, but in practice, they often require a reboot to become at all responsive any more [e.g. "zpool scrub [affected pool]" will hang forever without returning to a shell, eventually "zpool status" will hang forever]. I've seen this problem before, but we were running an old kernel [circa November 2009] from RELENG_8, and presumed it would go away on upgrade. The kernel config is the GENERIC config with the following modifications: # diff GENERIC DTRACE 19c19 < # $FreeBSD: src/sys/amd64/conf/GENERIC,v 1.531.2.13 2010/05/02 06:24:17 imp Exp $ --- > # $FreeBSD: src/sys/amd64/conf/GENERIC,v 1.531.2.8 2010/01/18 00:53:21 imp > Exp $ 22c22 < ident GENERIC --- > ident DTRACE 57c57 < options COMPAT_FREEBSD32 # Compatible with i386 binaries --- > options COMPAT_IA32 # Compatible with i386 binaries 76,77c76,78 < #options KDTRACE_FRAME # Ensure frames are compiled in < #options KDTRACE_HOOKS # Kernel DTrace hooks --- > options KDTRACE_FRAME # Ensure frames are compiled in > options KDTRACE_HOOKS # Kernel DTrace hooks > options DDB_CTF # Still more Dtrace-related hooks 227d227 < device sge # Silicon Integrated Systems SiS190/191 284d283 < options USB_DEBUG # enable debug msgs I'm sorry I can't include a precise revision number of the kernel, I used cvsup to pull it, and I don't know how to extract the revision number. I'm going to try pulling and installing latest RELENG_8 and see if that helps. For reference, the errors printed in kernel log when the zpool reported read/write errors on a disk: Jul 4 05:03:29 manticore kernel: arcmsr0:block 'read/write' commandwith gone raid volume Cmd= a, TargetId=1, Lun=4 Jul 4 05:03:29 manticore kernel: arcmsr0:block 'read/write' commandwith gone raid volume Cmd= a, TargetId=1, Lun=4 Jul 4 05:03:29 manticore kernel: arcmsr0:block 'read/write' commandwith gone raid volume Cmd= 8, TargetId=1, Lun=4 Status of the pool now: pool: cannoli state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub in progress for 0h13m, 0.87% done, 25h56m to go config: NAME STATE READ WRITE CKSUM cannoli ONLINE 0 0 0 da5 ONLINE 0 0 0 da6 ONLINE 0 0 0 da2 ONLINE 0 0 0 da4 ONLINE 0 0 4 errors: 1 data errors, use '-v' for a list At this point, the system will fail to reboot cleanly, as it spends forever waiting for the zfs filesystems to cleanly unmount [presumably.] My next kernel will have DDB built in. >How-To-Repeat: 1) Have a disk which occasionally reports uncorrected read/write errors with a ZFS filesystem on it. 2) ZFS will eventually completely cease to respond to all queries using the "zpool" or "zfs" commands. [traffic to the mounted filesystems is fine for much longer, until the point where the entire system becomes unresponsive.] >Fix: >Release-Note: >Audit-Trail: >Unformatted: _______________________________________________ freebsd-bugs@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"