Re: dangerous implementation of rep-sharing cache for fsfs

Mark Mielke Fri, 25 Jun 2010 07:16:15 -0700

The rep sharing collisions, even if possible due to collisions aroundthe world (has this happened in practice?), still have no effect on thesingle repository which does not experience the collision.

There are many widely used systems that rely on statisticalimprobability. Disk drive hardware failure is one example and I don'tunderstand why Michael is discounting this? Another huge example is theuse of UUID. Take an SCM system like ClearCase and understand that everyobject created gets a UUID. UUID is effectively generated in isolationaccording to some formula (location + time or random), and has a chanceof collision as well (even if time-based, it presumes that machine timealways moves forwards).

Why should Subversion solve a theoretical problem that doesn't seem toexist in the real world?

I agree with Hyrum. If you don't like it, turn it off. I don't see theproblem, and I would prefer the developers work on *real**demonstratable* problems, like merge conflicts involving file renames.

Michael: Feel free to show a *real* repository where rep-sharing cachehas caused a corruption due to use of SHA-1.


Cheers,
mark


On 06/25/2010 09:37 AM, Hyrum K. Wright wrote:

On Fri, Jun 25, 2010 at 1:45 PM,<michael.fe...@evonik.com>  wrote:

Hello,

I am actually more interested in finding reliable solution
instead of discussing mathematics and probabilities.

You can just disable rep-sharing.  You won't have the space savings,
but you'll feel better inside, knowing that the near-zero probability
of hash collisions is now nearer to zero.  I'm sorry that you can't
have your cake and eat it too.

Subversion 1.6.x has been released for over 16 months, and is in use
by *millions* of users.  We've yet to have a single complaint about
hash collisions.  While you may argue that this anecdotal evidence is
not a proof of correctness, I would claim that in this case, it is a
pretty good indicator.

...

So there are 256^1024 = 1,09*10^2466 different data sequences
of 1K size.
This means for every hash value there are
(256^1024)/(2^128)
= (2^(8*1024))/(2^128)
= (2^(8192))/(2^128)
= 2^(8192-128)
= 2^8064
= 3,21*10^2427 sequences of Data of 1K size
represented by the same hash value.

When you find a disk which will hold even a significant fraction of
these 3.21 * 10 ^ 2427 1K sequences, let's talk. :)

-Hyrum



--
Mark Mielke<m...@mielke.cc>

Re: dangerous implementation of rep-sharing cache for fsfs

Reply via email to