Re: question: why isn't a byte of a hash more uniform? how could I improve my code to cure that?

László Sándor Fri, 07 Aug 2009 11:51:30 -0700

Thank you, Tim. My comments are below.

On 2009-08-07 13:19:47 -0400, Tim Chase <[email protected]> said:

After I have written a short Python script that hashes my textfile line by
line and collects the numbers next to the original, I checked what I got.
Instead of getting around 25% in each treatment, the range is 17.8%-31.3%.
That sounds suspiciously like 25% with a +/- 7% fluctuation one mightexpect to see from non-random source data.

Could you help me where this range comes from? (If all my lines were"42", I wouldn't hit this range. So it cannot be a rule. Right?)

Remember that your outputs are driven purely by your inputs in adeterministic fashion -- if your inputs are purely random, then youroutputs should more closely match your expected bin'ing. If yourinputs aren't random, you get a taste of your own medicine ("my filehas just the number 42 on every line...why isn't my output random?").And randomness-of-hash-output is a red herring since hashing is *not*random.

Thanks, I tried to be correct with "pseudo-random". I understand thateverything is dependent on input. I want it to be the case. However, Ihoped that good hashes produce random-looking output from input withvery little variation. It would be strange if I could not get more than18% of lines with a remainder of 3 (after division by 4), whatever hashI try just because these are names of people.

Your input is also finite -- an aspect which leaves you a far cry fromthe full hash-space. If an md5 has 32 bytes (256 bits) of data, yourinput would have to cover 2**256 possible inputs to see the fullprofile of your outputs. That's a lot of input :)
-tkc


OK, I understand. Could anyone suggest a better way to do this, then?

(Recap: random-looking, close-to uniform assignment of one number outof four possibilities to strings.)


Thanks, everyone.

Laszlo

--
http://mail.python.org/mailman/listinfo/python-list

Re: question: why isn't a byte of a hash more uniform? how could I improve my code to cure that?

Reply via email to