On Fri, Sep 20, 2013 at 03:31:18PM +0200, Paul de Weerd wrote:
> On Fri, Sep 20, 2013 at 02:49:27PM +0200, Raimo Niskanen wrote:
> | On Fri, Sep 20, 2013 at 11:47:06AM +0000, hru...@gmail.com wrote:
> | > Andreas Gunnarsson <o-m...@zzlevo.net> wrote:
> | > 
> | > > On Thu, Sep 19, 2013 at 01:46:20PM +0000, hru...@gmail.com wrote:
> | > > > Raimo, if people believe that hash(A)=hash(B) implies A=B, so strong
> | > > > believe, that they use it in their programs,
> | > >
> | > > It's a matter of engineering. Usually that is good enough.
> | > >
> | > > If you don't think it's good enough then you should probably also worry
> | > > about how often strcmp(a, b) returns 0 when strings a and b don't match.
> | > 
> | > No, that is not good enough, that is not good, that is very bad. The
> | > probability of A=B under the condition hash(A)=hash(B) is close
> | > to zero, not to one as Marc Espie & Co are telling here. Please, read
> | 
> | You are again contradicting accepted knowledge without any motivation.
> | You can not go on doing that.  It offends people.
> 
> He's right, you know.
> 
> Let's assume a 1-bit hash function X.  X(A) = X(B) has a probability
> of 50% (p=0.5).  How many possible inputs are there to X ?  These are
> unlimited, therefore the chance that A = B if X(A) = X(B) is zero.
> 
> Now of course, SHA and MDx are not 1-bit hash functions.  Instead,
> they provide longer hashes, 160 bits for SHA-1.  That means there's
> 2^160 possible hashes.  Yet there's *STILL* infinite input strings to
> this hash.  Remember that a 5 megabyte file has 2^41943040 possible
> versions, so the chance that two random 5MB files are the same is
> 2^-41942880 (which, by all practical standards is zero).  So *PICKING
> RANDOM INPUT STRINGS*, the chance that these are the same when you
> know that their SHA sums match is still zero.

Oh. Now I see in which way he is right...  That was too obvious to see.

> 
> However, hruodr's ramblings have *nothing* to do with the rsync or
> cvsync case.  You don't pick _RANDOM_ input strings.  You pick strings
> *assuming* they're the same (since you've already copied them earlier
> - note that rsync will transfer the file in full after it finds it
> missing on the receiving side of the transfer).
> 
> Assuming A == B (not assuming hash(A) == hash(B)) when this is already
> quite likely (because the files were once the same) allows for using
> hash functions to check this assumption.  It is then easily verified
> or falsified.  This is precisely what rsync does.
> 
> So yeah, he is right in that p(A == B) = 0 when it's given that
> hash(A) == hash(B).  However, he's tricking you into thinking he's
> still talking about rsync when he isn't.  I have no idea what the
> relevance of his point is, but he is right about it.  He could've also
> mentioned that the pope is catholic or that the sun is hot.  Great
> stuff; rsync is still OK.
> 
> An elaborate troll, but a troll nontheless.
> 
> I am *SO* glad to have contributed to yet another megathread.

But a valuable contribution it was indeed!
Thank you very much!  Now I can let it go.

> 
> Paul 'WEiRD' de Weerd
> 
> -- 
> >++++++++[<++++++++++>-]<+++++++.>+++[<------>-]<.>+++[<+
> +++++++++++>-]<.>++[<------------>-]<+.--------------.[-]
>                  http://www.weirdnet.nl/                 

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB

Reply via email to