On Fri, Sep 20, 2013 at 02:49:27PM +0200, Raimo Niskanen wrote: | On Fri, Sep 20, 2013 at 11:47:06AM +0000, hru...@gmail.com wrote: | > Andreas Gunnarsson <o-m...@zzlevo.net> wrote: | > | > > On Thu, Sep 19, 2013 at 01:46:20PM +0000, hru...@gmail.com wrote: | > > > Raimo, if people believe that hash(A)=hash(B) implies A=B, so strong | > > > believe, that they use it in their programs, | > > | > > It's a matter of engineering. Usually that is good enough. | > > | > > If you don't think it's good enough then you should probably also worry | > > about how often strcmp(a, b) returns 0 when strings a and b don't match. | > | > No, that is not good enough, that is not good, that is very bad. The | > probability of A=B under the condition hash(A)=hash(B) is close | > to zero, not to one as Marc Espie & Co are telling here. Please, read | | You are again contradicting accepted knowledge without any motivation. | You can not go on doing that. It offends people.
He's right, you know. Let's assume a 1-bit hash function X. X(A) = X(B) has a probability of 50% (p=0.5). How many possible inputs are there to X ? These are unlimited, therefore the chance that A = B if X(A) = X(B) is zero. Now of course, SHA and MDx are not 1-bit hash functions. Instead, they provide longer hashes, 160 bits for SHA-1. That means there's 2^160 possible hashes. Yet there's *STILL* infinite input strings to this hash. Remember that a 5 megabyte file has 2^41943040 possible versions, so the chance that two random 5MB files are the same is 2^-41942880 (which, by all practical standards is zero). So *PICKING RANDOM INPUT STRINGS*, the chance that these are the same when you know that their SHA sums match is still zero. However, hruodr's ramblings have *nothing* to do with the rsync or cvsync case. You don't pick _RANDOM_ input strings. You pick strings *assuming* they're the same (since you've already copied them earlier - note that rsync will transfer the file in full after it finds it missing on the receiving side of the transfer). Assuming A == B (not assuming hash(A) == hash(B)) when this is already quite likely (because the files were once the same) allows for using hash functions to check this assumption. It is then easily verified or falsified. This is precisely what rsync does. So yeah, he is right in that p(A == B) = 0 when it's given that hash(A) == hash(B). However, he's tricking you into thinking he's still talking about rsync when he isn't. I have no idea what the relevance of his point is, but he is right about it. He could've also mentioned that the pope is catholic or that the sun is hot. Great stuff; rsync is still OK. An elaborate troll, but a troll nontheless. I am *SO* glad to have contributed to yet another megathread. Paul 'WEiRD' de Weerd -- >++++++++[<++++++++++>-]<+++++++.>+++[<------>-]<.>+++[<+ +++++++++++>-]<.>++[<------------>-]<+.--------------.[-] http://www.weirdnet.nl/