I have been tracking down a problem with "zfs diff" that reveals itself variously as a hang (unkillable process), panic or error, depending on the ZFS kernel version but seems to be caused by corruption within the pool. I am using FreeBSD but the issue looks to be generic ZFS, rather than FreeBSD-specific.
The hang and panic are related to the rw_enter() in opensolaris/uts/common/fs/zfs/zap.c:zap_get_leaf_byblk() The error is: Unable to determine path or stats for object 2128453 in tank/beckett/home@20120518: Invalid argument A scrub reports no issues: root@FB10-64:~ # zpool status pool: tank state: ONLINE status: The pool is formatted using a legacy on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on software that does not support feature flags. scan: scrub repaired 0 in 3h24m with 0 errors on Wed Nov 14 01:58:36 2012 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 ada2 ONLINE 0 0 0 errors: No known data errors But zdb says that object is the child of a plain file - which isn't sane: root@FB10-64:~ # zdb -vvv tank/beckett/home@20120518 2128453 Dataset tank/beckett/home@20120518 [ZPL], ID 605, cr_txg 8379, 143G, 2026419 objects, rootbp DVA[0]=<0:266a0efa00:200> DVA[1]=<0:31b07fbc00:200> [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=8375L/8375P fill=2026419 cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79 Object lvl iblk dblk dsize lsize %full type 2128453 1 16K 1.50K 1.50K 1.50K 100.00 ZFS plain file 264 bonus ZFS znode dnode flags: USED_BYTES USERUSED_ACCOUNTED dnode maxblkid: 0 path ???<object#2128453> uid 1000 gid 1000 atime Fri Mar 23 16:34:52 2012 mtime Sat Oct 22 16:13:42 2011 ctime Sun Oct 23 21:09:02 2011 crtime Sat Oct 22 16:13:42 2011 gen 2237174 mode 100444 size 1089 parent 2242171 links 1 pflags 40800000004 xattr 0 rdev 0x0000000000000000 root@FB10-64:~ # zdb -vvv tank/beckett/home@20120518 2242171 Dataset tank/beckett/home@20120518 [ZPL], ID 605, cr_txg 8379, 143G, 2026419 objects, rootbp DVA[0]=<0:266a0efa00:200> DVA[1]=<0:31b07fbc00:200> [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=8375L/8375P fill=2026419 cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79 Object lvl iblk dblk dsize lsize %full type 2242171 3 16K 128K 25.4M 25.5M 100.00 ZFS plain file 264 bonus ZFS znode dnode flags: USED_BYTES USERUSED_ACCOUNTED dnode maxblkid: 203 path /jashank/Pictures/sch/pdm-a4-11/stereo-pair-2.png uid 1000 gid 1000 atime Fri Mar 23 16:41:53 2012 mtime Mon Oct 24 21:15:56 2011 ctime Mon Oct 24 21:15:56 2011 crtime Mon Oct 24 21:15:37 2011 gen 2286679 mode 100644 size 26625731 parent 7001490 links 1 pflags 40800000004 xattr 0 rdev 0x0000000000000000 root@FB10-64:~ # zdb -vvv tank/beckett/home@20120518 7001490 Dataset tank/beckett/home@20120518 [ZPL], ID 605, cr_txg 8379, 143G, 2026419 objects, rootbp DVA[0]=<0:266a0efa00:200> DVA[1]=<0:31b07fbc00:200> [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=8375L/8375P fill=2026419 cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79 Object lvl iblk dblk dsize lsize %full type 7001490 1 16K 512 1K 512 100.00 ZFS directory 264 bonus ZFS znode dnode flags: USED_BYTES USERUSED_ACCOUNTED dnode maxblkid: 0 path /jashank/Pictures/sch/pdm-a4-11 uid 1000 gid 1000 atime Thu May 17 03:38:32 2012 mtime Mon Oct 24 21:15:37 2011 ctime Mon Oct 24 21:15:37 2011 crtime Fri Oct 14 22:17:44 2011 gen 2088407 mode 40755 size 6 parent 6370559 links 2 pflags 40800000144 xattr 0 rdev 0x0000000000000000 microzap: 512 bytes, 4 entries stereo-pair-2.png = 2242171 (type: Regular File) stereo-pair-2.xcf = 7002074 (type: Regular File) stereo-pair-1.xcf = 7001512 (type: Regular File) stereo-pair-1.png = 2241802 (type: Regular File) root@FB10-64:~ # The above experiments were carried out on a partial copy of the pool. The main pool started quite a long while ago and has been upgraded and moved several times using send/recv (which happily and quietly replicates the corruption). Note that I have never (intentionally) used extended attributes within the pool but it has been exported to Windows XP via Samba and possibly to OS-X via NFSv3. Does anyone have any suggestions for fixing the corruption? One suggestion was "tar c | tar x" but that is a last resort (since there are 54 filesystems and ~1900 snapshots in the pool). -- Peter Jeremy
pgpi6E6cZupsp.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss