Re: cvsync, rsync

Paul de Weerd Fri, 20 Sep 2013 06:34:16 -0700

On Fri, Sep 20, 2013 at 02:49:27PM +0200, Raimo Niskanen wrote:
| On Fri, Sep 20, 2013 at 11:47:06AM +0000, hru...@gmail.com wrote:
| > Andreas Gunnarsson <o-m...@zzlevo.net> wrote:
| > 
| > > On Thu, Sep 19, 2013 at 01:46:20PM +0000, hru...@gmail.com wrote:
| > > > Raimo, if people believe that hash(A)=hash(B) implies A=B, so strong
| > > > believe, that they use it in their programs,
| > >
| > > It's a matter of engineering. Usually that is good enough.
| > >
| > > If you don't think it's good enough then you should probably also worry
| > > about how often strcmp(a, b) returns 0 when strings a and b don't match.
| > 
| > No, that is not good enough, that is not good, that is very bad. The
| > probability of A=B under the condition hash(A)=hash(B) is close
| > to zero, not to one as Marc Espie & Co are telling here. Please, read
| 
| You are again contradicting accepted knowledge without any motivation.
| You can not go on doing that.  It offends people.


He's right, you know.

Let's assume a 1-bit hash function X.  X(A) = X(B) has a probability
of 50% (p=0.5).  How many possible inputs are there to X ?  These are
unlimited, therefore the chance that A = B if X(A) = X(B) is zero.

Now of course, SHA and MDx are not 1-bit hash functions.  Instead,
they provide longer hashes, 160 bits for SHA-1.  That means there's
2^160 possible hashes.  Yet there's *STILL* infinite input strings to
this hash.  Remember that a 5 megabyte file has 2^41943040 possible
versions, so the chance that two random 5MB files are the same is
2^-41942880 (which, by all practical standards is zero).  So *PICKING
RANDOM INPUT STRINGS*, the chance that these are the same when you
know that their SHA sums match is still zero.

However, hruodr's ramblings have *nothing* to do with the rsync or
cvsync case.  You don't pick _RANDOM_ input strings.  You pick strings
*assuming* they're the same (since you've already copied them earlier
- note that rsync will transfer the file in full after it finds it
missing on the receiving side of the transfer).

Assuming A == B (not assuming hash(A) == hash(B)) when this is already
quite likely (because the files were once the same) allows for using
hash functions to check this assumption.  It is then easily verified
or falsified.  This is precisely what rsync does.

So yeah, he is right in that p(A == B) = 0 when it's given that
hash(A) == hash(B).  However, he's tricking you into thinking he's
still talking about rsync when he isn't.  I have no idea what the
relevance of his point is, but he is right about it.  He could've also
mentioned that the pope is catholic or that the sun is hot.  Great
stuff; rsync is still OK.

An elaborate troll, but a troll nontheless.

I am *SO* glad to have contributed to yet another megathread.

Paul 'WEiRD' de Weerd

-- 
>++++++++[<++++++++++>-]<+++++++.>+++[<------>-]<.>+++[<+
+++++++++++>-]<.>++[<------------>-]<+.--------------.[-]
                 http://www.weirdnet.nl/

Re: cvsync, rsync

Reply via email to