Excuse me, but i original wrote the following E-Mail to Hyrum K. Wright 
directly,
because I wasn't used to the guidelines of the subversion project.

----- Weitergeleitet von Michael Felke/AN/Stockhausen/DE am 24.06.2010 
11:09 -----


Michael Felke
23.06.2010 14:07
 
        An:     hwri...@tigris.org
        Kopie: 
        Thema:  subversion Issue 2286: rep-sharing cache for fsfs

Hello Hyrum K. Wright,

sorry that i bother you with this directly, but i have no clue of work 
with the issue tracker.

I just started to checking the changes in 1.6 on possible problem, when 
updating our raw data repository to this version.
I found that the new representation caching would have an great impact on 
our site.

It could save us a lot of disk space on the server, because the software 
we are using, often generates file copies, witch are added as separate 
files.

But unfortunately it seems we could not use it :-(
Because after what the source code of rep.cache.c and fs_fs.c in 
libsvn_fs_fs looks to me, the mechanism to find an already existing 
representation is only relaying on the sha1-checksum.
Due to the possibility of hash collisions it's not enough to ensure that 
the found old representation is really an duplicate of the new one.
An undetected hash collision would result in a file with a totally wrong 
contents.

sha1 has been developed to detected modifications in a file and ensure 
that it's likely impossible to generate the same sha1-checksum be only 
modifying a file. 
So it is good to use it whether a file has been modified.
But it's not designed to check if two different files could possibly the 
same. 
There are always infinity numbers of independent files generating the same 
checksum.
Indeed, the number of hash collisions is only finite for a given file 
size, but is still increasing dramatically with the file size.
So additional checking of the file size helps but is not a completely 
satisfying solution.

The number of undetected hash collisions could be reduced easily by also 
checking the md5-checksum, the size and the expanded-size.
To make this feature totally reliable, a complete comparison of the files 
content with the content of the old representation found, is necessary

Yours sincerely

Michael Felke
Telefon +49 2151 38-1453
Telefax +49 2151 38-1094
michael.fe...@evonik.com
Evonik Stockhausen GmbH
Bäkerpfad 25
47805 Krefeld
http://www.evonik.com

Geschäftsführung: Gunther Wittmer (Sprecher), Willibrord Lampen

Sitz der Gesellschaft: Krefeld
Registergericht: Amtsgericht Krefeld; Handelsregister HRB 5791

This e-mail transmission, and any documents, files or previous e-mail 
messages attached to it may contain information that is confidential or 
legally privileged. If you are not the intended recipient, or a person 
responsible for delivering it to the intended recipient, you are hereby 
notified that you must not read this transmission and that any disclosure, 
copying, printing, distribution or use of any of the information contained 
in or attached to this transmission is STRICTLY PROHIBITED. If you have 
received this transmission in error, please immediately notify the sender 
by telephone or return e-mail and delete the original transmission and its 
attachments without reading or saving in any manner. Thank you. 

Reply via email to