On 11 July, 2012 - Sa??o Kiselkov sent me these 1,4K bytes:

> On 07/11/2012 10:50 AM, Ferenc-Levente Juhos wrote:
> > Actually although as you pointed out that the chances to have an sha256
> > collision is minimal, but still it can happen, that would mean
> > that the dedup algorithm discards a block that he thinks is a duplicate.
> > Probably it's anyway better to do a byte to byte comparison
> > if the hashes match to be sure that the blocks are really identical.
> > 
> > The funny thing here is that ZFS tries to solve all sorts of data integrity
> > issues with checksumming and healing, etc.,
> > and on the other hand a hash collision in the dedup algorithm can cause
> > loss of data if wrongly configured.
> > 
> > Anyway thanks that you have brought up the subject, now I know if I will
> > enable the dedup feature I must set it to sha256,verify.
> 
> Oh jeez, I can't remember how many times this flame war has been going
> on on this list. Here's the gist: SHA-256 (or any good hash) produces a
> near uniform random distribution of output. Thus, the chances of getting
> a random hash collision are around 2^-256 or around 10^-77. If I asked
> you to pick two atoms at random *from the entire observable universe*,
> your chances of hitting on the same atom are higher than the chances of
> that hash collision. So leave dedup=on with sha256 and move on.

So in ZFS, which normally uses 128kB blocks, you can instead store them
100% uniquely into 32 bytes.. A nice 4096x compression rate..
decompression is a bit slower though..

/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to