Saso, I'm not flaming at all, I happen to disagree, but still I understand that chances are very very very slim, but as one poster already said, this is how the lottery works. I'm not saying one should make an exhaustive search with trillions of computers just to produce a sha256 collision. If I wanted an exhaustive search I would generate all the numbers from 0 to 2**256 and I would definitely get at least 1 collision. If you formulate it in another way, by generating all the possible 256 bit (32 byte) blocks + 1 you will definitely get a collision. This is much more credible than the analogy with the age of the universe and atoms picked at random, etc.
The fact is it can happen, it's entirely possible that there are two jpg's in the universe with different content and they have the same hash. I can't prove the existence of those, but you can't deny it. The fact is, that zfs and everyone using it trys to correct data degradation e.g. cause by cosmic rays, and on the other hand their using probability calculation (no matter how slim the chances are) to potentially discard valid data. You can come with other universe and atom theories and with the age of the universe, etc. The fact remains the same. And each generation was convinced that their current best checksum or hash algorithm is the best and will be the best forever. MD5 has demonstrated that it's not the case. Time will tell what becomes of SHA256, but why take any chances. On Wed, Jul 11, 2012 at 11:10 AM, Sašo Kiselkov <skiselkov...@gmail.com>wrote: > On 07/11/2012 10:50 AM, Ferenc-Levente Juhos wrote: > > Actually although as you pointed out that the chances to have an sha256 > > collision is minimal, but still it can happen, that would mean > > that the dedup algorithm discards a block that he thinks is a duplicate. > > Probably it's anyway better to do a byte to byte comparison > > if the hashes match to be sure that the blocks are really identical. > > > > The funny thing here is that ZFS tries to solve all sorts of data > integrity > > issues with checksumming and healing, etc., > > and on the other hand a hash collision in the dedup algorithm can cause > > loss of data if wrongly configured. > > > > Anyway thanks that you have brought up the subject, now I know if I will > > enable the dedup feature I must set it to sha256,verify. > > Oh jeez, I can't remember how many times this flame war has been going > on on this list. Here's the gist: SHA-256 (or any good hash) produces a > near uniform random distribution of output. Thus, the chances of getting > a random hash collision are around 2^-256 or around 10^-77. If I asked > you to pick two atoms at random *from the entire observable universe*, > your chances of hitting on the same atom are higher than the chances of > that hash collision. So leave dedup=on with sha256 and move on. > > Cheers, > -- > Saso >
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss