Raimo Niskanen <raimo+open...@erix.ericsson.se> wrote: > A resembling application is the Git version control system that is > based on the assumption that all content blobs can be uniquely > decribed by their 128-bit SHA1 hash value. If two blobs have > the same hash value they are assumed to be identical.
The developers of rsync and git may be carefull and their programs be in practice reliable. But perhaps we will have in future a lot of mediocre probabilistic programmers that dont care too much. > If SHA1 is a perfect cryptographic hash value the probability > for mistake is as has been said before 2^-128 which translates > to (according to the old MB vs MiB rule of 10 bit corresponding > to 3 decimal digits) around 10^-37. Your sentence above begins with the two letters: "if". > According to a previous post in this thread the probability for > disk bit error for a 4 TB hard drive is around 10^-15 so the > SHA1 hash value wint with a factor 10^22 which is a big margin. > So it can be 10^22 times worse than perfect and still beat > the hard drive error probability. And in a later Email you write: > And now I just read on the Wikipedia page for SHA-1 that a theoretical > weakness was discovered in 2011 that can find a collision with a > complexity of 2^61, which gives a probability of 10^-18 still > 1000 times better than a hard drive of 10^-15. The probability sinks as people discover something new about the hash function? Will it continously sink as the hash function becomes better known? This is a very subjective probability. Can we rely on this kind of probability calculations? > Now you can read what you can find about cryptographic > hash algorighms to convince yourself that the algorithms > used by rsync and/or Git are good enough. [...] I agree that I have too read, although I was never interested in cryptography. > The assumption of cryptographic hash functions being, according > to their definition; reliable, is heavily used today. At least > by rsync and Git. And there must be a lot of intelligent and > hopefully skilled people backing that up. We must hope, believe and pray. - Marc Espie <es...@nerim.net> wrote: > A little knowledge is a dangerous thing. > > "weakness" in a cryptographic setting doesn't mean *anything* if > you're using it as a pure checksum to find out accidental errors. And now we are back to my starting poit. The checksum is not used in rsync as a pure checksum to find accidental errors. That was my critic. >From a checksum I expect two things: (1) the pre-images of elements in the range have all similar sizes, (2) it is very "discontinous". The second to use it for proving the integrity of data transmited: little changes produce a completely different checksums. The values when the changes are big do not play a role. Now, Rsync conclude A=B from hash(A)=hash(B) also when A and B are completely different. Are md4 and sha1 good? When we use rsync and git, we are part of a big empirical proof (or refutation) of it. Rodrigo.