On Fri, Sep 20, 2013 at 03:31:18PM +0200, Paul de Weerd wrote: > On Fri, Sep 20, 2013 at 02:49:27PM +0200, Raimo Niskanen wrote: > | On Fri, Sep 20, 2013 at 11:47:06AM +0000, hru...@gmail.com wrote: > | > Andreas Gunnarsson <o-m...@zzlevo.net> wrote: > | > > | > > On Thu, Sep 19, 2013 at 01:46:20PM +0000, hru...@gmail.com wrote: > | > > > Raimo, if people believe that hash(A)=hash(B) implies A=B, so strong > | > > > believe, that they use it in their programs, > | > > > | > > It's a matter of engineering. Usually that is good enough. > | > > > | > > If you don't think it's good enough then you should probably also worry > | > > about how often strcmp(a, b) returns 0 when strings a and b don't match. > | > > | > No, that is not good enough, that is not good, that is very bad. The > | > probability of A=B under the condition hash(A)=hash(B) is close > | > to zero, not to one as Marc Espie & Co are telling here. Please, read > | > | You are again contradicting accepted knowledge without any motivation. > | You can not go on doing that. It offends people. > > He's right, you know. > > Let's assume a 1-bit hash function X. X(A) = X(B) has a probability > of 50% (p=0.5). How many possible inputs are there to X ? These are > unlimited, therefore the chance that A = B if X(A) = X(B) is zero. > > Now of course, SHA and MDx are not 1-bit hash functions. Instead, > they provide longer hashes, 160 bits for SHA-1. That means there's > 2^160 possible hashes. Yet there's *STILL* infinite input strings to > this hash. Remember that a 5 megabyte file has 2^41943040 possible > versions, so the chance that two random 5MB files are the same is > 2^-41942880 (which, by all practical standards is zero). So *PICKING > RANDOM INPUT STRINGS*, the chance that these are the same when you > know that their SHA sums match is still zero.
Oh. Now I see in which way he is right... That was too obvious to see. > > However, hruodr's ramblings have *nothing* to do with the rsync or > cvsync case. You don't pick _RANDOM_ input strings. You pick strings > *assuming* they're the same (since you've already copied them earlier > - note that rsync will transfer the file in full after it finds it > missing on the receiving side of the transfer). > > Assuming A == B (not assuming hash(A) == hash(B)) when this is already > quite likely (because the files were once the same) allows for using > hash functions to check this assumption. It is then easily verified > or falsified. This is precisely what rsync does. > > So yeah, he is right in that p(A == B) = 0 when it's given that > hash(A) == hash(B). However, he's tricking you into thinking he's > still talking about rsync when he isn't. I have no idea what the > relevance of his point is, but he is right about it. He could've also > mentioned that the pope is catholic or that the sun is hot. Great > stuff; rsync is still OK. > > An elaborate troll, but a troll nontheless. > > I am *SO* glad to have contributed to yet another megathread. But a valuable contribution it was indeed! Thank you very much! Now I can let it go. > > Paul 'WEiRD' de Weerd > > -- > >++++++++[<++++++++++>-]<+++++++.>+++[<------>-]<.>+++[<+ > +++++++++++>-]<.>++[<------------>-]<+.--------------.[-] > http://www.weirdnet.nl/ -- / Raimo Niskanen, Erlang/OTP, Ericsson AB