The rep sharing collisions, even if possible due to collisions around the world (has this happened in practice?), still have no effect on the single repository which does not experience the collision.

There are many widely used systems that rely on statistical improbability. Disk drive hardware failure is one example and I don't understand why Michael is discounting this? Another huge example is the use of UUID. Take an SCM system like ClearCase and understand that every object created gets a UUID. UUID is effectively generated in isolation according to some formula (location + time or random), and has a chance of collision as well (even if time-based, it presumes that machine time always moves forwards).

Why should Subversion solve a theoretical problem that doesn't seem to exist in the real world?

I agree with Hyrum. If you don't like it, turn it off. I don't see the problem, and I would prefer the developers work on *real* *demonstratable* problems, like merge conflicts involving file renames.

Michael: Feel free to show a *real* repository where rep-sharing cache has caused a corruption due to use of SHA-1.

Cheers,
mark


On 06/25/2010 09:37 AM, Hyrum K. Wright wrote:
On Fri, Jun 25, 2010 at 1:45 PM,<michael.fe...@evonik.com>  wrote:
Hello,

I am actually more interested in finding reliable solution
instead of discussing mathematics and probabilities.
You can just disable rep-sharing.  You won't have the space savings,
but you'll feel better inside, knowing that the near-zero probability
of hash collisions is now nearer to zero.  I'm sorry that you can't
have your cake and eat it too.

Subversion 1.6.x has been released for over 16 months, and is in use
by *millions* of users.  We've yet to have a single complaint about
hash collisions.  While you may argue that this anecdotal evidence is
not a proof of correctness, I would claim that in this case, it is a
pretty good indicator.

...
So there are 256^1024 = 1,09*10^2466 different data sequences
of 1K size.
This means for every hash value there are
(256^1024)/(2^128)
= (2^(8*1024))/(2^128)
= (2^(8192))/(2^128)
= 2^(8192-128)
= 2^8064
= 3,21*10^2427 sequences of Data of 1K size
represented by the same hash value.
When you find a disk which will hold even a significant fraction of
these 3.21 * 10 ^ 2427 1K sequences, let's talk. :)

-Hyrum



--
Mark Mielke<m...@mielke.cc>

Reply via email to