Re: sha-2 sum of files?

Dr.Ruud Fri, 14 Jun 2013 00:08:15 -0700

On 14/06/2013 08:02, Shlomi Fish wrote:

On Thu, 13 Jun 2013 22:51:24 +0200
lee <l...@yun.yagibdah.de> wrote:

How likely is it that the hash is the same though the file did change?


Well, if you take SHA-256 for example, then its hash has 256 bits so you have a
chance of 1 / (2**256) that two non-identical byte vectors will have the same
contents.

If you would only store the hash value, and none of the filename,filesize, etc. with it, then there is the 'birthday paradox'.

http://en.wikipedia.org/wiki/Birthday_problem

For example, I have 2**20 (100M) email-addresses, each represents acustomer.

I gave them a pseudo-id: the left half of the MD5 of the (normalized)email-address. A handy unsigned int of 64 bits. (MySQL: bigint unsigned)


No clashes yet, but in this case any clashes are expected and acceptable.

Also easy to test: generate 100M email-address-like strings,
and count the MD5/2-clashes.

--
Ruud


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: sha-2 sum of files?

Reply via email to