On 14/06/2013 08:02, Shlomi Fish wrote:
On Thu, 13 Jun 2013 22:51:24 +0200
lee <l...@yun.yagibdah.de> wrote:
How likely is it that the hash is the same though the file did change?
Well, if you take SHA-256 for example, then its hash has 256 bits so you have a
chance of 1 / (2**256) that two non-identical byte vectors will have the same
contents.
If you would only store the hash value, and none of the filename,
filesize, etc. with it, then there is the 'birthday paradox'.
http://en.wikipedia.org/wiki/Birthday_problem
For example, I have 2**20 (100M) email-addresses, each represents a
customer.
I gave them a pseudo-id: the left half of the MD5 of the (normalized)
email-address. A handy unsigned int of 64 bits. (MySQL: bigint unsigned)
No clashes yet, but in this case any clashes are expected and acceptable.
Also easy to test: generate 100M email-address-like strings,
and count the MD5/2-clashes.
--
Ruud
--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/