>>>>> "re" == Richard Elling <richard.ell...@gmail.com> writes:
>>>>> "r" == Ross  <myxi...@googlemail.com> writes:

    re> The answer to this question must be known before the
    re> effectiveness of a checksum can be evaluated.

...well...we can use math to know that a checksum is effective.  What
you are really suggesting we evaluate ``empirically'' is the degree of
INeffectiveness of the broken checksum.

     r> ZFS stores two copies of the metadata for any block, so
     r> corrupt metadata really shouldn't happen often.

the other copy probably won't be read if the first copy read has a
valid checksum.  I think it'll more likely just lazy-panic instead.
If that's the case, the two copies won't help cover up the broken
checksum bug.  but Richard's table says metadata has fletcher4 which
the OP said is as good as the correct algorithm would have been, even
in its broken implementation, so long as it's only used up to
128kByte.  It's only data and ZIL that has the relevantly-broken
checksum, according to his math.

    re> The overwhelming empirical evidence suggests that fletcher2
    re> catches many storage system corruptions.

What do you mean by the word ``many''?  It's a weasel-word.  It
basically means, AFAICT, ``the broken checksum still trips
sometimes.''  But have you any empirical evidence about the fraction
of real world errors which are still caught by the broken checksum
vs. those that are not?  I don't see how you could.

How about cases where checksums are not used to correct bit-flip
gremlins but relied upon to determine whether a data structure is
fully present (committed) yet, like in the ZIL, or to determine which
half of a mirror is stale---these are cases where checksums could be
wrong even if the storage subsystem is functioning in an ideal way.

Checksum weakness on ZFS where checksums are presumed good by other
parts of the design could potentially be worse overall than a
checksumless design.  That's not my impression, but it's the right
place to put the bar.  Ray's ``well at least it's better than no
checksums'' is wrong because it presumes ZFS could function as well as
another filesystem if ZFS were using a hypothetical null checksum.  It
couldn't.

Anyway I'm glad the problem is both fixed and also avoidable on the
broken systems.  I just think the doublespeak after the fact is, once
again, not helping anyone.

Attachment: pgpSoPvsby5bY.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to