On Sun, 7 Jun 2015 04:27 pm, Cecil Westerhof wrote: > I wrote a very simple function to test random: > def test_random(length, multiplier = 10000): > number_list = length * [0] > for i in range(length * multiplier): > number_list[random.randint(0, length - 1)] += 1 > minimum = min(number_list) > maximum = max(number_list) > return (minimum, maximum, minimum / maximum)
Putting aside the timing aspects, your frequency calculations are not done in a very Pythonic manner. A better way might be: from collections import Counter from random import randint def test_random(length, multiplier = 10000): freq = Counter( randint(0, length - 1) for i in range(length * multiplier) ) minimum = min(freq[i] for i in range(length)) maximum = max(freq.values()) return (minimum, maximum, minimum / maximum) Note carefully the way I calculate the minimum value. That way, if by chance some number has a frequency of zero, the minimum returned will be zero. For the maximum, that doesn't matter, since any missing numbers will not be the most frequent number. Better than returning the min and max, you should return the counter object itself. Before doing so, I make sure that any missing numbers are given a value of zero. def test_random(size, samples=1000000): """Return a sample of `samples` random numbers from 0 to size-1.""" freq = Counter(randint(0, size-1) for i in range(samples)) # Avoid missing numbers. (Likely for small samples.) freq.subtract(dict.fromkeys(range(size), 0)) return freq Now once you have the frequencies, you can look at many interesting statistics, and verify that you have a valid sample. py> f = test_random(100) py> sum(f.values()) == 1000000 # Total number of samples is right? True py> sorted(f) == list(range(100)) # Any missing numbers? True py> max(f) - min(f) # statistical range 99 py> min(f.values()) # Lowest frequency? 9775 py> max(f.values()) # And the highest? 10313 The ratio of those two frequencies isn't very interesting, but we can calculate it: py> min(f.values())/max(f.values()) 0.9478328323475226 For a uniformly distributed population, that ratio will be 1.0 exactly, but for any finite sample, it could be any number between 0 and 1. We expect that if we have a large random sample, the ratio should approach 1, but moderate deviations from that are expected. In fact, we should be surprised if the ratio is exactly 1, and suspect shenanigans if it is. We can extract the mode, and its frequency: py> f.most_common(1) # mode [(85, 10313)] In Python 3.4 or better, we can look at some more statistics: py> import statistics py> statistics.median(f.elements()) 50.0 py> statistics.mean(f.elements()) 49.531949 py> statistics.stdev(f.elements()) 28.873178122988367 And verify that the mode is correct: py> statistics.mode(f.elements()) 85 We can see what happens if the data is fiddled with: py> g = f.copy() py> g[42] += g[23] py> g[23] = 0 py> statistics.median(g.elements()) 50.0 py> statistics.mean(g.elements()) 49.7263 py> statistics.stdev(g.elements()) 28.757647388343212 py> g.most_common(3) [(42, 20258), (85, 10313), (56, 10208)] The median is unchanged, the mean shifts slightly higher, and the standard deviation increases. But as you can see, these are not particularly powerful tests of randomness: only the mode shows an obvious deviation from uniformity. > But what are the ranges you > would expect with a good random function. Then it could be used to > test a random function. Python's pseudo-random number generator is based on the Mersenne Twister. This is one of the most high-quality and extensively tested PRNGs available, with a period of 2**19937-1 before it repeats. Its randomness properties are very good indeed. (Although of course that isn't to say that there aren't bugs in the Python implementation.) Mersenne Twister is is not suitable for cryptographic work, but apart from that, it is pretty much as indistinguishable from random as any deterministic computer program can be. -- Steven -- https://mail.python.org/mailman/listinfo/python-list