Excuse me, but i original wrote the following E-Mail to Hyrum K. Wright directly, because I wasn't used to the guidelines of the subversion project.
----- Weitergeleitet von Michael Felke/AN/Stockhausen/DE am 24.06.2010 11:09 ----- Michael Felke 23.06.2010 14:07 An: hwri...@tigris.org Kopie: Thema: subversion Issue 2286: rep-sharing cache for fsfs Hello Hyrum K. Wright, sorry that i bother you with this directly, but i have no clue of work with the issue tracker. I just started to checking the changes in 1.6 on possible problem, when updating our raw data repository to this version. I found that the new representation caching would have an great impact on our site. It could save us a lot of disk space on the server, because the software we are using, often generates file copies, witch are added as separate files. But unfortunately it seems we could not use it :-( Because after what the source code of rep.cache.c and fs_fs.c in libsvn_fs_fs looks to me, the mechanism to find an already existing representation is only relaying on the sha1-checksum. Due to the possibility of hash collisions it's not enough to ensure that the found old representation is really an duplicate of the new one. An undetected hash collision would result in a file with a totally wrong contents. sha1 has been developed to detected modifications in a file and ensure that it's likely impossible to generate the same sha1-checksum be only modifying a file. So it is good to use it whether a file has been modified. But it's not designed to check if two different files could possibly the same. There are always infinity numbers of independent files generating the same checksum. Indeed, the number of hash collisions is only finite for a given file size, but is still increasing dramatically with the file size. So additional checking of the file size helps but is not a completely satisfying solution. The number of undetected hash collisions could be reduced easily by also checking the md5-checksum, the size and the expanded-size. To make this feature totally reliable, a complete comparison of the files content with the content of the old representation found, is necessary Yours sincerely Michael Felke Telefon +49 2151 38-1453 Telefax +49 2151 38-1094 michael.fe...@evonik.com Evonik Stockhausen GmbH Bäkerpfad 25 47805 Krefeld http://www.evonik.com Geschäftsführung: Gunther Wittmer (Sprecher), Willibrord Lampen Sitz der Gesellschaft: Krefeld Registergericht: Amtsgericht Krefeld; Handelsregister HRB 5791 This e-mail transmission, and any documents, files or previous e-mail messages attached to it may contain information that is confidential or legally privileged. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are hereby notified that you must not read this transmission and that any disclosure, copying, printing, distribution or use of any of the information contained in or attached to this transmission is STRICTLY PROHIBITED. If you have received this transmission in error, please immediately notify the sender by telephone or return e-mail and delete the original transmission and its attachments without reading or saving in any manner. Thank you.