Andreas Gunnarsson <o-m...@zzlevo.net> wrote: > On Thu, Sep 19, 2013 at 01:46:20PM +0000, hru...@gmail.com wrote: > > Raimo, if people believe that hash(A)=hash(B) implies A=B, so strong > > believe, that they use it in their programs, > > It's a matter of engineering. Usually that is good enough. > > If you don't think it's good enough then you should probably also worry > about how often strcmp(a, b) returns 0 when strings a and b don't match.
No, that is not good enough, that is not good, that is very bad. The probability of A=B under the condition hash(A)=hash(B) is close to zero, not to one as Marc Espie & Co are telling here. Please, read my answer to Henning Brauer from Wednesday to see why rsync is giving correct answers in praxis. --- On Thu, 19 Sep 2013, Matthew Weigel wrote: > That seems like a useful exercise for you to do. Like Marc said very early > on, rsync is based in part on Andrew Tridgell's PhD Thesis, "Efficient > Algorithms for Sorting and Synchronization." You can find it and read it at > http://www.samba.org/~tridge/phd_thesis.pdf. Thanks. He checks at the end the "transmited file" with a checksum, and if it does not pass the check, the algorithm is repeated with a variation of the checksum to find "equal parts". Although Tridgell is very optimistic when calculating probabilities, he found this last check necessary. With very big, very different files will rsync perhaps not be as efficient as promissed due to this backtracking. > If you are still worried about it, you are trolling either misc@ or > yourself or both. If I express critics, then I am XYZ, where XYZ ranges (till now) from troll through idiot until motherf****r. Is this an honest way of arguing? And what is a person that argues in this way? Rodrigo.