The zdb interface is certainly unstable.  We plan on automatically doing
this at a future date (bugid not handy), but it's a little tricky for
live filesystems.  If your filesystem is undergoing a lot of churn, you
may notice that zdb(1M) will blow up with an I/O error or assertion
failure somewhere, because it's not in-sync with the kernel's version.
Eventually, we will have a method of doing this at the ZPL layer, so
that we can correctly get this information for mounted filesystems.

So feel free to demonstrate this (its the only usable workaround at the
moment), with the caveat that:

        - zdb(1M) is unstable and can change at any point
        - it may not work on a live pool

We've also thought about how to repair such damage.  Plain file contents
are pretty easy, but metadata can be tricky, because we don't know the
extent of blocks that it references.  So if we just delete it, we'll
leak blocks now and forever.

- Eric

On Thu, Jul 20, 2006 at 07:39:08AM -0600, Gregory Shaw wrote:
> Hi.  I'm in the process of writing an introductory paper on ZFS.    
> The paper is meant to be something that could be given to a systems  
> admin at a site to introduce ZFS and document common procedures for  
> using ZFS.
> 
> In the paper, I want to document the method for identifying which  
> file has a checksum error.  In previous discussions on this alias,  
> I've used the following method:
> 
> zpool status -v
>   pool: local
> state: ONLINE
> status: One or more devices has experienced an error resulting in data
>         corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
>         entire pool from backup.
>    see: http://www.sun.com/msg/ZFS-8000-8A
> scrub: scrub completed with 4 errors on Wed Jul 12 20:38:03 2006
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         local       ONLINE       0     0     8
>           c0d0s7    ONLINE       0     0     4
>           c1d0s2    ONLINE       0     0     4
> 
> errors: The following persistent errors have been detected:
> 
>           DATASET      OBJECT  RANGE
>           local/music  31018   6291456-6422528
>           local/music  37932   1572864-1703936
>           local/music  12895   4063232-4194304
>           local/music  7782    3145728-3276800
> 
>  zdb -vvv local/music 31018
> Dataset local/music [ZPL], ID 21, cr_txg 286098, last_txg 569229,  
> 266G, 47341 objects, rootbp [L0 DMU objset] 400L/200P DVA[0] 
> =<1:1e60334600:200> DVA[1]=<0:1f34545e00:200> DVA[2] 
> =<1:209bb8a00:200> fletcher4 lzjb LE contiguous birth=569229  
> fill=47341 cksum=bfbec0b7e:4cabe29d1ca:f8ffe68a911f:22341ff0761b57
> 
>     Object  lvl   iblk   dblk  lsize  asize  type
>      31018    2    16K   128K  7.50M  7.51M  ZFS plain file
>                                  264  bonus  ZFS znode
>         path    /Mos Def/Black on Both Sides/03 Love.mp3
>         atime   Tue Jul  4 01:26:27 2006
>         mtime   Sat Apr 15 20:17:19 2006
>         ctime   Tue Jul  4 01:26:27 2006
>         crtime  Tue Jul  4 01:26:26 2006
>         gen     328624
>         mode    100755
>         size    7762952
>         parent  26652
>         links   1
>         xattr   0
>         rdev    0x0000000000000000
> 
> The above is a real error that I've encountered on a snv_41 machine  
> that I use to store a backup of my music collection.   It's a x86 (32- 
> bit) machine that has either bad disks, or, a bad controller.
> 
> My question:  Is the above an interface that should be documented as  
> the method for identifying what file has an error?  Or is there some  
> other interface that is either better documented or better supported?
> 
> I don't want to put unstable interfaces in the document if I can  
> avoid it.
> 
> Thanks!
> 
> -----
> Gregory Shaw, IT Architect
> Phone: (303) 673-8273        Fax: (303) 673-8273
> ITCTO Group, Sun Microsystems Inc.
> 1 StorageTek Drive MS 4382              [EMAIL PROTECTED] (work)
> Louisville, CO 80028-4382                 [EMAIL PROTECTED] (home)
> "When Microsoft writes an application for Linux, I've Won." - Linus  
> Torvalds
> 
> 

> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to