On Mon, January 10, 2011 02:41, Eric D. Mudama wrote: > On Sun, Jan 9 at 22:54, Peter Taps wrote: >> Thank you all for your help. I am the OP. >> >> I haven't looked at the link that talks about the probability of >> collision. Intuitively, I still wonder how the chances of collision >> can be so low. We are reducing a 4K block to just 256 bits. If the >> chances of collision are so low, *theoretically* it is possible to >> reconstruct the original block from the 256-bit signature by using a >> simple lookup. Essentially, we would now have world's best >> compression algorithm irrespective of whether the data is text or >> binary. This is hard to digest. > > "simple" lookup isn't so simple when there are 2^256 records to > search, however, fundamentally your understanding of hashes is > correct. [...]
It should also be noted that ZFS itself can "only" address 2^128 bytes (not even 4K 'records'), and supposedly to fill those 2^128 bytes it would take as much energy as it would take to boil the Earth's oceans: http://blogs.sun.com/bonwick/entry/128_bit_storage_are_you So recording and looking up 2^256 records would be quite an accomplishment. It's a lot of data. If the OP wants to know why the chances are so low, he'll have to learn a bit about hash functions (which is what SHA-256 is): http://en.wikipedia.org/wiki/Hash_function http://en.wikipedia.org/wiki/Cryptographic_hash_function Knowing exactly how the math (?) works is not necessary, but understanding the principles would be useful if one wants to have a general picture as to why SHA-256 doesn't need a verification step, and why it was chosen as one of the ZFS (dedupe) checksum options. _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss