On Sat, 16 Apr 2005, Brian O'Mahoney wrote:

Three points:
(1) I _have_ seen real-life collisions with MD5, in the context of
   Document management systems containing ~10^6 ms-WORD documents.
(2) The HMAC (ethernet-harware-address) of any interface _should_
   help to make a unique Id.

you want a unique ID that can be computed directly from the file contents.

what file integrety programa (ala tripwire) do is to use multiple identification routines (aide uses MD4+MD5+filesize IIRC)


David Lang wrote:
On Sat, 16 Apr 2005, Ingo Molnar wrote:

* David Lang <[EMAIL PROTECTED]> wrote:

this issue was raised a few days ago in the context of someone
tampering with the files and it was decided that the extra checks were
good enough to prevent this (at least for now), but what about
accidental collisions?

if I am understanding things right the objects get saved in the
filesystem in filenames that are the SHA1 hash. of two legitimate
files have the same hash I don't see any way for both of them to
exist.

yes the risk of any two files having the same has is low, but in the
earlier thread someone chimed in and said that they had two files on
their system that had the same hash..


you can add -DCOLLISION_CHECK to Makefile:CFLAGS to turn on collision
checking (disabled currently). If there indeed exist two files that have
different content but the same hash, could someone send those two files?


remember that the flap over SHA1 being 'broken' a couple weeks ago was
not from researchers finding multiple files with the same hash, but
finding that it was more likly then expected that files would have the
same hash.

there was qa discussion on LKML within the last year about useing MD5
hashes for identifying unique filesystem blocks (with the idea of being
able to merge identical blocks) and in that discussion it was pointed
out that collisions are a known real-life issue.

so if collision detection is turned on in git, does that make it error
out if it runs into a second file with the same hash, or does it do
something else?

David Lang


-- Brian


-- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. -- C.A.R. Hoare - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html

Reply via email to