I think much in the same vein as git, venti doesn't need to worry too
much about collisions given the behavior when collisions occur is
well-defined and sensible in both systems.
It's second-preimage's that are more of a concern (and still not
possible with SHA1). The lack of preimage attacks on SHA1 prevents
people from maliciously creating a file with the same hash as one you
created. They can only duplicate ones they created which should limit
the scope of any maliciousness to stuff they have control over.
At the point preimages are practical, I'd want to be long gone from
SHA1 but IIRC even MD5 still has no practical second-preimage attacks
so we're probably a long way off from there.

Technically, anything relying on venti should handle the collision
detected response gracefully, as it's always a possibility no matter
the algorithm.
If fossil doesn't handle it very well perhaps it's not venti that
needs changed (given it detects & reports) but fossil.
A top-of-the-head suggestion would be for fossil to respond to the
collision notice by doing something to the block that can be undone
later (as others above have hinted at) such as appending something,
XOR, etc., marking it as such in its own data structures then passing
it back to venti. It could then reverse the operation when retrieving
the files with the 'collision fixed' flag set.
I don't know how feasible that idea is (been a while since I looked at
fossil) but worth looking into maybe? It would seem, at a cursory
glance, fix the problem for fossil+venti indefinitely at the cost of a
minor computational overhead for retrieving collided files.

As Charles pointed out, you could also just do that in venti, I guess
it depends if the write API call contract in venti is "returns SHA1 of
file" or "returns arbitrary file id".
If the behavior was put into venti you couldn't assume the ID returned
= sha1(block) anymore - but I don't know if anything relies on that
behavior.
As for venti, I wouldn't say 'no point' to an algorithm update, but
I'd rather have fossil updated to manage to deal with collisions
better first.


On Mon, Feb 27, 2017 at 8:14 PM, Bakul Shah <ba...@bitblocks.com> wrote:
> On Mon, 27 Feb 2017 19:02:29 GMT Charles Forsyth <charles.fors...@gmail.com> 
> wrote:
>> On 27 February 2017 at 18:30, Charles Forsyth <charles.fors...@gmail.com>
>> wrote:
>>
>> > that's a separate argument that venti would never work for you, regardless
>> > of the hash algorithm used.
>
>> since venti returns the resulting score from each write, and it knows
>> whether there's been a collision,
>> it appears it could return a modified score (having ensured that is now
>> unique, "and the next judge said that's a very shaggy dog")
>
> Consider what can happens you want to consolidate two venti
> archives into another one. Each source venti has a different
> file with the same hash. When you discover in the destination
> venti that they collide, it is too late to return a modified
> score -- you have to find and fix all pointer blocks that
> refer to this block as well.
>
> In theory the  chance of a random collion with SHA1 may be
> 1 in 2^80 but we have existing files that collide (unlike the
> hypothetical argument of someone wanting to store 10^21 byte
> size files -- but if they can produce it, we can store it!).
> Your argument is that since venti is readonly, existing data
> in it is not vulnerable but not everyone stores their archives
> on readonly medium.  Another argument would be that almost
> always venti is privately used and unlikely to be accessible
> to the badguys.  Yet another argument is that hardly anyone
> uses venti so why even bother. These are behavior patterns
> that are true today but why limit its usefulness?
>
> Just as we move archived data we care about to more modern
> media (as we no longer have easy access to floppies, 9track
> tapes, 1.4" streamer tape etc.), and update our crypto keys,
> since they too have limited shelf-life, we can replace the use
> of SHA1.  This is a fixable problem.  [It is much much worse
> for git given the amount of s/w that relies on it. I think
> it is a matter of time before someone comes up with a
> collision between two different types of git objects (such as
> a blob and a tree) but we'll let Linus worry about it :-)]
>
> The solution is to convert from sha1 to blake2b or something
> strong and be prepared to move the data again in 10-20 years.
>

Reply via email to