Re: [zfs-discuss] checksum errors on root pool after upgrade to snv_94

Jürgen Keil Mon, 21 Jul 2008 02:19:57 -0700

Bill Sommerfeld wrote:

> On Fri, 2008-07-18 at 10:28 -0700, Jürgen Keil wrote:
> > > I ran a scrub on a root pool after upgrading to snv_94, and got checksum 
> > > errors:
> > 
> > Hmm, after reading this, I started a zpool scrub on my mirrored pool, 
> > on a system that is running post snv_94 bits:  It also found checksum errors
> > 
> 
> out of curiosity, is this a root pool?


It started as standard pool, and is using version 3 zpool format.

I'm using a small ufs root, and have /usr as a zfs filesystem on
that pool.

At some point in the past i did setup a zfs root and /usr filesystem
for experimenting with xVM unstable bits.


> A second system of mine with a mirrored root pool (and an additional
> large multi-raidz pool) shows the same symptoms on the mirrored root
> pool only.
> 
> once is accident.  twice is coincidence.  three times is enemy action :-)
> 
> I'll file a bug as soon as I can (I'm travelling at the moment with
> spotty connectivity), citing my and your reports.

Btw. I also found the scrub checksum errors on a non-mirrored zpool
(laptop with only one hdd).

And on one zpool that was using a non-mirrored, striped pool on two
S-ATA drives.


I think that in my case the cause for the scrub checksum errors is an
open ZIL transaction on an *unmounted* zfs filesystem.  In the past
such a zfs state prevented creating snapshots for the unmounted zfs,
see bug 6482985, 6462803.  That is still the case.  But now it also
seems to trigger checksum errors for a zpool scrub.

Stack backtrace for the ECKSUM (which gets translated into EIO errors
in arc_read_done()):

  1  64703           arc_read_nolock:return, rval 5
              zfs`zil_read_log_block+0x140
              zfs`zil_parse+0x155
              zfs`traverse_zil+0x55
              zfs`scrub_visitbp+0x284
              zfs`scrub_visit_rootbp+0x4e
              zfs`scrub_visitds+0x82
              zfs`dsl_pool_scrub_sync+0x109
              zfs`dsl_pool_sync+0x158
              zfs`spa_sync+0x254
              zfs`txg_sync_thread+0x226
              unix`thread_start+0x8




Does a "zdb -ivv {pool}" report any ZIL headers with a claim_txg != 0
on your pools?  Is the dataset that is associated with such a ZIL an
unmounted zfs?

    # zdb -ivv files | grep claim_txg
    ZIL header: claim_txg 5164405, seq 0
    ZIL header: claim_txg 0, seq 0
    ZIL header: claim_txg 0, seq 0
    ZIL header: claim_txg 0, seq 0
    ZIL header: claim_txg 0, seq 0
    ZIL header: claim_txg 5164405, seq 0
    ZIL header: claim_txg 0, seq 0


# zdb -ivvvv files/matrix-usr
Dataset files/matrix-usr [ZPL], ID 216, cr_txg 5091978, 2.39G, 192089 objects

    ZIL header: claim_txg 5164405, seq 0

        first block: [L0 ZIL intent log] 1000L/1000P DVA[0]=<0:12421e0000:1000> 
zilog uncompressed LE contiguous birth=5163908 fill=0 
cksum=c368086f1485f7c4:39a549a81d769386:d8:3

        Block seqno 3, already claimed, [L0 ZIL intent log] 1000L/1000P 
DVA[0]=<0:12421e0000:1000> zilog uncompressed LE contiguous birth=5163908 
fill=0 cksum=c368086f1485f7c4:39a549a81d769386:d8:3


On two of my zpools I've eliminated the zpool scrub checksum errors by
mounting /  unmounting the zfs with the unplayed ZIL.
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] checksum errors on root pool after upgrade to snv_94

Reply via email to