Marc Espie <es...@nerim.net> wrote: > On Mon, Sep 16, 2013 at 08:16:50PM +0000, hru...@gmail.com wrote: > > Marc Espie <es...@nerim.net> wrote: > > > > > > >From a checksum I expect two things: (1) the pre-images of elements > > > > in the range have all similar sizes, > > > Why ? This makes no sense, and is in contradiction with (2). > > > > I must correct my previous mail. The Domain is numerable, to speak > > about cardinality as size do not help, but perhaps you get the > > idea. The probability of colisions is stronly dependent on how the > > hash distributes the elements of the domain in the range. > > That's still a crypto hash. Apart from specially crafted attacks, > input size is irrelevant to the chance of collision.
It is not the input size, it is how the input is mapped. In the case of rsync the hash is applied to strings of a fixed lenth. In this case the input is finite and we can argue with cardinality. Just imagine the set finite strings mapped to a single element in the range. If all these sets have the same number of elements and the range n elements, then the probability of colition is n*(1/n)^2=1/nr; otherwise it is greater (simple school agebra to calculate it). The extreme case is that all strings are mapped to the same element. Rodrigo.