Bill Sommerfeld wrote:
> On Fri, 2008-07-18 at 10:28 -0700, Jürgen Keil wrote:
> > > I ran a scrub on a root pool after upgrading to snv_94, and got checksum
> > > errors:
> >
> > Hmm, after reading this, I started a zpool scrub on my mirrored pool,
> > on a system that is running post snv_94 bits: It also found checksum errors
> >
>
> out of curiosity, is this a root pool?
It started as standard pool, and is using version 3 zpool format.
I'm using a small ufs root, and have /usr as a zfs filesystem on
that pool.
At some point in the past i did setup a zfs root and /usr filesystem
for experimenting with xVM unstable bits.
> A second system of mine with a mirrored root pool (and an additional
> large multi-raidz pool) shows the same symptoms on the mirrored root
> pool only.
>
> once is accident. twice is coincidence. three times is enemy action :-)
>
> I'll file a bug as soon as I can (I'm travelling at the moment with
> spotty connectivity), citing my and your reports.
Btw. I also found the scrub checksum errors on a non-mirrored zpool
(laptop with only one hdd).
And on one zpool that was using a non-mirrored, striped pool on two
S-ATA drives.
I think that in my case the cause for the scrub checksum errors is an
open ZIL transaction on an *unmounted* zfs filesystem. In the past
such a zfs state prevented creating snapshots for the unmounted zfs,
see bug 6482985, 6462803. That is still the case. But now it also
seems to trigger checksum errors for a zpool scrub.
Stack backtrace for the ECKSUM (which gets translated into EIO errors
in arc_read_done()):
1 64703 arc_read_nolock:return, rval 5
zfs`zil_read_log_block+0x140
zfs`zil_parse+0x155
zfs`traverse_zil+0x55
zfs`scrub_visitbp+0x284
zfs`scrub_visit_rootbp+0x4e
zfs`scrub_visitds+0x82
zfs`dsl_pool_scrub_sync+0x109
zfs`dsl_pool_sync+0x158
zfs`spa_sync+0x254
zfs`txg_sync_thread+0x226
unix`thread_start+0x8
Does a "zdb -ivv {pool}" report any ZIL headers with a claim_txg != 0
on your pools? Is the dataset that is associated with such a ZIL an
unmounted zfs?
# zdb -ivv files | grep claim_txg
ZIL header: claim_txg 5164405, seq 0
ZIL header: claim_txg 0, seq 0
ZIL header: claim_txg 0, seq 0
ZIL header: claim_txg 0, seq 0
ZIL header: claim_txg 0, seq 0
ZIL header: claim_txg 5164405, seq 0
ZIL header: claim_txg 0, seq 0
# zdb -ivvvv files/matrix-usr
Dataset files/matrix-usr [ZPL], ID 216, cr_txg 5091978, 2.39G, 192089 objects
ZIL header: claim_txg 5164405, seq 0
first block: [L0 ZIL intent log] 1000L/1000P DVA[0]=<0:12421e0000:1000>
zilog uncompressed LE contiguous birth=5163908 fill=0
cksum=c368086f1485f7c4:39a549a81d769386:d8:3
Block seqno 3, already claimed, [L0 ZIL intent log] 1000L/1000P
DVA[0]=<0:12421e0000:1000> zilog uncompressed LE contiguous birth=5163908
fill=0 cksum=c368086f1485f7c4:39a549a81d769386:d8:3
On two of my zpools I've eliminated the zpool scrub checksum errors by
mounting / unmounting the zfs with the unplayed ZIL.
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss