I have been tracking down a problem with "zfs diff" that reveals
itself variously as a hang (unkillable process), panic or error,
depending on the ZFS kernel version but seems to be caused by
corruption within the pool.  I am using FreeBSD but the issue looks to
be generic ZFS, rather than FreeBSD-specific.

The hang and panic are related to the rw_enter() in
opensolaris/uts/common/fs/zfs/zap.c:zap_get_leaf_byblk()

The error is:
Unable to determine path or stats for object 2128453 in 
tank/beckett/home@20120518: Invalid argument

A scrub reports no issues:
root@FB10-64:~ # zpool status
  pool: tank
 state: ONLINE
status: The pool is formatted using a legacy on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
        pool will no longer be accessible on software that does not support 
feature
        flags.
  scan: scrub repaired 0 in 3h24m with 0 errors on Wed Nov 14 01:58:36 2012
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          ada2      ONLINE       0     0     0

errors: No known data errors

But zdb says that object is the child of a plain file - which isn't sane:

root@FB10-64:~ # zdb -vvv tank/beckett/home@20120518 2128453
Dataset tank/beckett/home@20120518 [ZPL], ID 605, cr_txg 8379, 143G, 2026419 
objects, rootbp DVA[0]=<0:266a0efa00:200> DVA[1]=<0:31b07fbc00:200> [L0 DMU 
objset] fletcher4 lzjb LE contiguous unique double size=800L/200P 
birth=8375L/8375P fill=2026419 
cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
   2128453    1    16K  1.50K  1.50K  1.50K  100.00  ZFS plain file
                                        264   bonus  ZFS znode
        dnode flags: USED_BYTES USERUSED_ACCOUNTED 
        dnode maxblkid: 0
        path    ???<object#2128453>
        uid     1000
        gid     1000
        atime   Fri Mar 23 16:34:52 2012
        mtime   Sat Oct 22 16:13:42 2011
        ctime   Sun Oct 23 21:09:02 2011
        crtime  Sat Oct 22 16:13:42 2011
        gen     2237174
        mode    100444
        size    1089
        parent  2242171
        links   1
        pflags  40800000004
        xattr   0
        rdev    0x0000000000000000

root@FB10-64:~ # zdb -vvv tank/beckett/home@20120518 2242171
Dataset tank/beckett/home@20120518 [ZPL], ID 605, cr_txg 8379, 143G, 2026419 
objects, rootbp DVA[0]=<0:266a0efa00:200> DVA[1]=<0:31b07fbc00:200> [L0 DMU 
objset] fletcher4 lzjb LE contiguous unique double size=800L/200P 
birth=8375L/8375P fill=2026419 
cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
   2242171    3    16K   128K  25.4M  25.5M  100.00  ZFS plain file
                                        264   bonus  ZFS znode
        dnode flags: USED_BYTES USERUSED_ACCOUNTED 
        dnode maxblkid: 203
        path    /jashank/Pictures/sch/pdm-a4-11/stereo-pair-2.png
        uid     1000
        gid     1000
        atime   Fri Mar 23 16:41:53 2012
        mtime   Mon Oct 24 21:15:56 2011
        ctime   Mon Oct 24 21:15:56 2011
        crtime  Mon Oct 24 21:15:37 2011
        gen     2286679
        mode    100644
        size    26625731
        parent  7001490
        links   1
        pflags  40800000004
        xattr   0
        rdev    0x0000000000000000

root@FB10-64:~ # zdb -vvv tank/beckett/home@20120518 7001490
Dataset tank/beckett/home@20120518 [ZPL], ID 605, cr_txg 8379, 143G, 2026419 
objects, rootbp DVA[0]=<0:266a0efa00:200> DVA[1]=<0:31b07fbc00:200> [L0 DMU 
objset] fletcher4 lzjb LE contiguous unique double size=800L/200P 
birth=8375L/8375P fill=2026419 
cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
   7001490    1    16K    512     1K    512  100.00  ZFS directory
                                        264   bonus  ZFS znode
        dnode flags: USED_BYTES USERUSED_ACCOUNTED 
        dnode maxblkid: 0
        path    /jashank/Pictures/sch/pdm-a4-11
        uid     1000
        gid     1000
        atime   Thu May 17 03:38:32 2012
        mtime   Mon Oct 24 21:15:37 2011
        ctime   Mon Oct 24 21:15:37 2011
        crtime  Fri Oct 14 22:17:44 2011
        gen     2088407
        mode    40755
        size    6
        parent  6370559
        links   2
        pflags  40800000144
        xattr   0
        rdev    0x0000000000000000
        microzap: 512 bytes, 4 entries

                stereo-pair-2.png = 2242171 (type: Regular File)
                stereo-pair-2.xcf = 7002074 (type: Regular File)
                stereo-pair-1.xcf = 7001512 (type: Regular File)
                stereo-pair-1.png = 2241802 (type: Regular File)

root@FB10-64:~ #

The above experiments were carried out on a partial copy of the pool.
The main pool started quite a long while ago and has been upgraded and
moved several times using send/recv (which happily and quietly
replicates the corruption).  Note that I have never (intentionally)
used extended attributes within the pool but it has been exported to
Windows XP via Samba and possibly to OS-X via NFSv3.

Does anyone have any suggestions for fixing the corruption?  One
suggestion was "tar c | tar x" but that is a last resort (since there
are 54 filesystems and ~1900 snapshots in the pool).

-- 
Peter Jeremy

Attachment: pgpi6E6cZupsp.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to