On Jan 6, 2011, at 11:44 AM, Peter Taps wrote:

> Folks,
> 
> I have been told that the checksum value returned by Sha256 is almost 
> guaranteed to be unique. In fact, if Sha256 fails in some case, we have a 
> bigger problem such as memory corruption, etc. Essentially, adding 
> verification to sha256 is an overkill.

I disagree. I do not believe you can uniquely identify all possible 
permutations of 1 million
bits using only 256 bits.

> Perhaps (Sha256+NoVerification) would work 99.999999% of the time. But 
> (Fletcher+Verification) would work 100% of the time.
> 
> Which one of the two is a better deduplication strategy?

If you love your data, always use verify=on

> If we do not use verification with Sha256, what is the worst case scenario? 
> Is it just more disk space occupied (because of failure to detect duplicate 
> blocks) or there is a chance of actual data corruption (because two blocks 
> were assumed to be duplicate although they are not)?

If you do not use verify=on, you risk repeatable data corruption.  

In some postings you will find claims of the "odds being 1 in 2^256 +/-"  for a 
collision.  This is correct.  However, they will then compare this to the odds 
of
a disk read error.  There is an important difference however -- the disk error
is likely to be noticed, but a collision is completely silent without the 
verify 
option.  This is why it is a repeatable problem, different than hardware 
failures
which are not repeatable.  Accepting repeatable and silent data corruption is a 
very bad tradeoff, IMNSHO.

> Or, if I go with (Sha256+Verification), how much is the overhead of 
> verification on the overall process?

In my experience, I see little chance that a verification will be used. As 
above,
you might run into a collision, but it will be rare.

> If I do go with verification, it seems (Fletcher+Verification) is more 
> efficient than (Sha256+Verification). And both are 100% accurate in detecting 
> duplicate blocks.

Yes.  Fletcher with verification will be more performant than sha-256.
However, that option is not available in the Solaris releases.
 -- richard


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to