>>>>> "re" == Richard Elling <richard.ell...@gmail.com> writes: >>>>> "r" == Ross <myxi...@googlemail.com> writes:
re> The answer to this question must be known before the re> effectiveness of a checksum can be evaluated. ...well...we can use math to know that a checksum is effective. What you are really suggesting we evaluate ``empirically'' is the degree of INeffectiveness of the broken checksum. r> ZFS stores two copies of the metadata for any block, so r> corrupt metadata really shouldn't happen often. the other copy probably won't be read if the first copy read has a valid checksum. I think it'll more likely just lazy-panic instead. If that's the case, the two copies won't help cover up the broken checksum bug. but Richard's table says metadata has fletcher4 which the OP said is as good as the correct algorithm would have been, even in its broken implementation, so long as it's only used up to 128kByte. It's only data and ZIL that has the relevantly-broken checksum, according to his math. re> The overwhelming empirical evidence suggests that fletcher2 re> catches many storage system corruptions. What do you mean by the word ``many''? It's a weasel-word. It basically means, AFAICT, ``the broken checksum still trips sometimes.'' But have you any empirical evidence about the fraction of real world errors which are still caught by the broken checksum vs. those that are not? I don't see how you could. How about cases where checksums are not used to correct bit-flip gremlins but relied upon to determine whether a data structure is fully present (committed) yet, like in the ZIL, or to determine which half of a mirror is stale---these are cases where checksums could be wrong even if the storage subsystem is functioning in an ideal way. Checksum weakness on ZFS where checksums are presumed good by other parts of the design could potentially be worse overall than a checksumless design. That's not my impression, but it's the right place to put the bar. Ray's ``well at least it's better than no checksums'' is wrong because it presumes ZFS could function as well as another filesystem if ZFS were using a hypothetical null checksum. It couldn't. Anyway I'm glad the problem is both fixed and also avoidable on the broken systems. I just think the doublespeak after the fact is, once again, not helping anyone.
pgpSoPvsby5bY.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss