On Tue, Mar 23, 2010 at 07:22:59PM -0400, Frank Middleton wrote:
> On 03/22/10 11:50 PM, Richard Elling wrote:
>  
>> Look again, the checksums are different.
>
> Whoops, you are correct, as usual. Just 6 bits out of 256 different...
>
> Look which bits are different -  digits 24, 53-56 in both cases.

This is very likely an error introduced during the calculation of
the hash, rather than an error in the input data.  I don't know how
that helps narrow down the source of the problem, though..

It suggests an experiment: try switching to another hash algorithm.
It may move the problem around, or even make it worse, of course.

I'm also reminded of a thread about the implementation of fletcher2
being flawed, perhaps you're better switching regardless.

>>> o Why is the file flagged by ZFS as fatally corrupted still accessible?
>
> This is the part I was hoping to get answers for since AFAIK this
> should be impossible. Since none of this is having any operational
> impact, all of these issues are of interest only, but this is a bit scary!

It's only the blocks with bad checksums that should return errors.
Maybe you're not reading those, or the transient error doesn't happen
next time when you actually try to read it / from the other side of
the mirror.

Repeated errors in the same file could also be a symptom of an error
calculating the hash when the file was written.  If there's a
bit-flipping issue at the root of it, with some given probability,
that would invert the probabilities of "correct" and "error" results.

--
Dan.

Attachment: pgpGRgBlRkr4l.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to