Dave Sill <[EMAIL PROTECTED]> wrote:
>
> The first thing I did was Google for "hash prime modulo even
> distribution". That turns up many repetitions of Charles' assertion,
> without proof or explanation.
[...]
> Being a ``Profile, don't speculate'' kind of guy, I decided to write a
> little program to test modulo hashes, which is attached to this
> message for your entertainment.
>
> The result is that I can't see any effect of primality of the hash
> table size on distribution.
Funny, I can reproduce it easily. Sure your random numbers are random?
With the attached Python script (it depends on the popular stats.py and
pstat.py modules), analyzing 250000 15-bit random integers (read from a text
file, one per line), I get the following:
[charlesc@charon personal]$ ./buckets.py 12 13
A: 12 buckets
count: 250000
mean: 20833.3333
std. dev.: 196.3153
B: 13 buckets
count: 250000
mean: 19230.7692
std. dev.: 103.8646
The effect does appear to diminish significantly for large values, though.
Charles
--
-----------------------------------------------------------------------
Charles Cazabon <[EMAIL PROTECTED]>
GPL'ed software available at: http://www.qcc.sk.ca/~charlesc/software/
-----------------------------------------------------------------------
#!/usr/bin/python
import whrandom
import string
import stats
import sys
ints = map (int, map (string.strip, open ('randints.txt').readlines ()))
#######################################
def fill_buckets (numbuckets):
buckets = [0] * numbuckets
for i in ints:
x = i % numbuckets
buckets[x] = buckets[x] + 1
return buckets
#######################################
def main (a, b):
A = fill_buckets (a)
B = fill_buckets (b)
for L, name in ((A, 'A'), (B, 'B')):
sum = reduce (lambda a, b: a+b, L)
mean = stats.mean (L)
dev = stats.stdev (L)
print '%s: %i buckets' % (name, len (L))
print ' count: %10i' % sum
print ' mean: %7.04f' % mean
print ' std. dev.: %7.04f' % dev
print
#######################################
if __name__ == '__main__':
a = int (sys.argv[1])
b = int (sys.argv[2])
main (a, b)