On 11 July, 2012 - Sa??o Kiselkov sent me these 1,4K bytes: > On 07/11/2012 10:50 AM, Ferenc-Levente Juhos wrote: > > Actually although as you pointed out that the chances to have an sha256 > > collision is minimal, but still it can happen, that would mean > > that the dedup algorithm discards a block that he thinks is a duplicate. > > Probably it's anyway better to do a byte to byte comparison > > if the hashes match to be sure that the blocks are really identical. > > > > The funny thing here is that ZFS tries to solve all sorts of data integrity > > issues with checksumming and healing, etc., > > and on the other hand a hash collision in the dedup algorithm can cause > > loss of data if wrongly configured. > > > > Anyway thanks that you have brought up the subject, now I know if I will > > enable the dedup feature I must set it to sha256,verify. > > Oh jeez, I can't remember how many times this flame war has been going > on on this list. Here's the gist: SHA-256 (or any good hash) produces a > near uniform random distribution of output. Thus, the chances of getting > a random hash collision are around 2^-256 or around 10^-77. If I asked > you to pick two atoms at random *from the entire observable universe*, > your chances of hitting on the same atom are higher than the chances of > that hash collision. So leave dedup=on with sha256 and move on.
So in ZFS, which normally uses 128kB blocks, you can instead store them 100% uniquely into 32 bytes.. A nice 4096x compression rate.. decompression is a bit slower though.. /Tomas -- Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss