On Jan 6, 2011, at 11:44 AM, Peter Taps wrote: > Folks, > > I have been told that the checksum value returned by Sha256 is almost > guaranteed to be unique. In fact, if Sha256 fails in some case, we have a > bigger problem such as memory corruption, etc. Essentially, adding > verification to sha256 is an overkill.
I disagree. I do not believe you can uniquely identify all possible permutations of 1 million bits using only 256 bits. > Perhaps (Sha256+NoVerification) would work 99.999999% of the time. But > (Fletcher+Verification) would work 100% of the time. > > Which one of the two is a better deduplication strategy? If you love your data, always use verify=on > If we do not use verification with Sha256, what is the worst case scenario? > Is it just more disk space occupied (because of failure to detect duplicate > blocks) or there is a chance of actual data corruption (because two blocks > were assumed to be duplicate although they are not)? If you do not use verify=on, you risk repeatable data corruption. In some postings you will find claims of the "odds being 1 in 2^256 +/-" for a collision. This is correct. However, they will then compare this to the odds of a disk read error. There is an important difference however -- the disk error is likely to be noticed, but a collision is completely silent without the verify option. This is why it is a repeatable problem, different than hardware failures which are not repeatable. Accepting repeatable and silent data corruption is a very bad tradeoff, IMNSHO. > Or, if I go with (Sha256+Verification), how much is the overhead of > verification on the overall process? In my experience, I see little chance that a verification will be used. As above, you might run into a collision, but it will be rare. > If I do go with verification, it seems (Fletcher+Verification) is more > efficient than (Sha256+Verification). And both are 100% accurate in detecting > duplicate blocks. Yes. Fletcher with verification will be more performant than sha-256. However, that option is not available in the Solaris releases. -- richard _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss