On 11/16/12 17:15, Peter Jeremy wrote:
I have been tracking down a problem with "zfs diff" that reveals
itself variously as a hang (unkillable process), panic or error,
depending on the ZFS kernel version but seems to be caused by
corruption within the pool. I am using FreeBSD but the issue looks to
be generic ZFS, rather than FreeBSD-specific.
The hang and panic are related to the rw_enter() in
opensolaris/uts/common/fs/zfs/zap.c:zap_get_leaf_byblk()
There is probably nothing wrong with the snapshots. This is a bug in
ZFS diff. The ZPL parent pointer is only guaranteed to be correct for
directory objects. What you probably have is a file that was hard
linked multiple times and the parent pointer (i.e. directory) was
recycled and is now a file
The error is:
Unable to determine path or stats for object 2128453 in
tank/beckett/home@20120518: Invalid argument
A scrub reports no issues:
root@FB10-64:~ # zpool status
pool: tank
state: ONLINE
status: The pool is formatted using a legacy on-disk format. The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done, the
pool will no longer be accessible on software that does not support
feature
flags.
scan: scrub repaired 0 in 3h24m with 0 errors on Wed Nov 14 01:58:36 2012
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
ada2 ONLINE 0 0 0
errors: No known data errors
But zdb says that object is the child of a plain file - which isn't sane:
root@FB10-64:~ # zdb -vvv tank/beckett/home@20120518 2128453
Dataset tank/beckett/home@20120518 [ZPL], ID 605, cr_txg 8379, 143G, 2026419 objects,
rootbp DVA[0]=<0:266a0efa00:200> DVA[1]=<0:31b07fbc00:200> [L0 DMU objset]
fletcher4 lzjb LE contiguous unique double size=800L/200P birth=8375L/8375P fill=2026419
cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79
Object lvl iblk dblk dsize lsize %full type
2128453 1 16K 1.50K 1.50K 1.50K 100.00 ZFS plain file
264 bonus ZFS znode
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 0
path ???<object#2128453>
uid 1000
gid 1000
atime Fri Mar 23 16:34:52 2012
mtime Sat Oct 22 16:13:42 2011
ctime Sun Oct 23 21:09:02 2011
crtime Sat Oct 22 16:13:42 2011
gen 2237174
mode 100444
size 1089
parent 2242171
links 1
pflags 40800000004
xattr 0
rdev 0x0000000000000000
root@FB10-64:~ # zdb -vvv tank/beckett/home@20120518 2242171
Dataset tank/beckett/home@20120518 [ZPL], ID 605, cr_txg 8379, 143G, 2026419 objects,
rootbp DVA[0]=<0:266a0efa00:200> DVA[1]=<0:31b07fbc00:200> [L0 DMU objset]
fletcher4 lzjb LE contiguous unique double size=800L/200P birth=8375L/8375P fill=2026419
cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79
Object lvl iblk dblk dsize lsize %full type
2242171 3 16K 128K 25.4M 25.5M 100.00 ZFS plain file
264 bonus ZFS znode
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 203
path /jashank/Pictures/sch/pdm-a4-11/stereo-pair-2.png
uid 1000
gid 1000
atime Fri Mar 23 16:41:53 2012
mtime Mon Oct 24 21:15:56 2011
ctime Mon Oct 24 21:15:56 2011
crtime Mon Oct 24 21:15:37 2011
gen 2286679
mode 100644
size 26625731
parent 7001490
links 1
pflags 40800000004
xattr 0
rdev 0x0000000000000000
root@FB10-64:~ # zdb -vvv tank/beckett/home@20120518 7001490
Dataset tank/beckett/home@20120518 [ZPL], ID 605, cr_txg 8379, 143G, 2026419 objects,
rootbp DVA[0]=<0:266a0efa00:200> DVA[1]=<0:31b07fbc00:200> [L0 DMU objset]
fletcher4 lzjb LE contiguous unique double size=800L/200P birth=8375L/8375P fill=2026419
cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79
Object lvl iblk dblk dsize lsize %full type
7001490 1 16K 512 1K 512 100.00 ZFS directory
264 bonus ZFS znode
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 0
path /jashank/Pictures/sch/pdm-a4-11
uid 1000
gid 1000
atime Thu May 17 03:38:32 2012
mtime Mon Oct 24 21:15:37 2011
ctime Mon Oct 24 21:15:37 2011
crtime Fri Oct 14 22:17:44 2011
gen 2088407
mode 40755
size 6
parent 6370559
links 2
pflags 40800000144
xattr 0
rdev 0x0000000000000000
microzap: 512 bytes, 4 entries
stereo-pair-2.png = 2242171 (type: Regular File)
stereo-pair-2.xcf = 7002074 (type: Regular File)
stereo-pair-1.xcf = 7001512 (type: Regular File)
stereo-pair-1.png = 2241802 (type: Regular File)
root@FB10-64:~ #
The above experiments were carried out on a partial copy of the pool.
The main pool started quite a long while ago and has been upgraded and
moved several times using send/recv (which happily and quietly
replicates the corruption). Note that I have never (intentionally)
used extended attributes within the pool but it has been exported to
Windows XP via Samba and possibly to OS-X via NFSv3.
Does anyone have any suggestions for fixing the corruption? One
suggestion was "tar c | tar x" but that is a last resort (since there
are 54 filesystems and ~1900 snapshots in the pool).
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss