Matimus, I was surprised that "lazy" was the algorithm that won your time tests, and I saw a way to improve it even better (algorithm is O(# ones in number) rather than O(# bits in number)) def lazy2(a, b, bits=32): x = (a ^ b) & ((1 << bits) - 1) tot = 0 while x: tot += 1 x &= x-1 return tot
# times on my system (run a few times just to check for sure) python -mtimeit -s"from ham import *" "test(lazy)" 10000 loops, best of 3: 121 usec per loop python -mtimeit -s"from ham import *" "test(lazy2)" 10000 loops, best of 3: 62.4 usec per loop Check my math, but I think that's correct. Here's my derivation (but it's been a while since my Boolean algebra days, so I may've made a mistake!) It sounds right, though, since subtracting one in two's complement flips the rightmost one and inverts the zeros to the right of it. So, x & (x-1) would remove the rightmost one. Right? Long Derivation: The "trick" to finding the rightmost one in a number: pos = x ^ (x-1) It has to do with how two's-complement works. In our algorithm above, we are trying to count them, so we want to flip off the bits one by one from the right. So in each loop: x = x & ~pos But, then you notice you can simplify it even more (let y=x-1) and use mult/add syntax for & and | and use X=~x and Y=~y x * ~(x ^ y) x * ~(xY+Xy) [ def of ^ ] x * (~(xY)*~(Xy)) [ DeMoires Law ] x * ( (X+y)*(x+Y) ) [ inversion] x * (X+y) * (x+Y) [ associative] (xX+xy)*(x+Y) [ distributive ] xy*(x+Y) [ xX = 0 ] xy+xyY [ distrib ] xy [yY = 0] So, x &= x-1 On 19 Jun 2008, at 17:37, Matimus wrote: On Jun 19, 4:27 pm, godavemon <[EMAIL PROTECTED]> wrote: I need to calculate the Hamming Distance of two integers. The hamming distance is the number of bits in two integers that don't match. I thought there'd be a function in math or scipy but i haven't been able to find one. This is my function but it seems like there should be a faster way. I do this computation many times and speed up is important. def hamdist( a, b , bits = 32): def _hamdist( x, bits): if bits: return (x & 1) + _hamdist(x >> 1, bits-1) return x & 1 return _hamdist( a ^ b, bits) Another alternative would be to convert the XOR to a binary string and count the # of 1's. Which would be fastest? Are there better alternatives? Thanks! I see no good reason to use recursion for this type of thing. Here are some of my attempts: [code] from math import log def yours(a, b , bits = 32): def _hamdist( x, bits): if bits: return (x & 1) + _hamdist(x >> 1, bits-1) return x & 1 return _hamdist(a ^ b, bits) def simple(a, b, bits=32): x = a ^ b return sum((x >> i & 1) for i in xrange(bits)) def lazy(a, b, bits=32): x = (a ^ b) & ((1 << bits) - 1) tot = 0 while x: tot += x & 1 x >>= 1 return tot def fancy(a, b, bits=32): x = (a ^ b) & ((1 << bits) - 1) tot = 0 while x: tot += 1 x ^= 1 << int(log(x, 2)) return tot test_vals = ( ((0xffffffff, 0), 32), ((0,0), 0), ((1,0), 1), ((0x80000000, 0), 1), ((0x55555555, 0), 16) ) def test(f): test_vals = ( ((0xffffffff, 0), 32), # ALL ((0,0), 0), # None ((1,0), 1), # First ((0x80000000, 0), 1), # Last ((0x55555555, 0), 16), # Every Other ((0xffff, 0), 16), # First Half ((0xffff0000, 0), 16), # Last Half ) for i, (args, exp) in enumerate(test_vals): if f(*args) != exp: return 0 return 1 if __name__ == "__main__": for f in (yours, simple, lazy, fancy): if not test(f): print "%s failed"%f.__name__ [/code] The python module `timeit` is handy for testing speed: python -mtimeit -s"from hamdist import *" "test(yours)" 10000 loops, best of 3: 95.1 usec per loop python -mtimeit -s"from hamdist import *" "test(simple)" 10000 loops, best of 3: 65.3 usec per loop python -mtimeit -s"from hamdist import *" "test(lazy)" 10000 loops, best of 3: 59.8 usec per loop python -mtimeit -s"from hamdist import *" "test(fancy)" 10000 loops, best of 3: 77.2 usec per loop Even the ridiculous `fancy` version beat the recursive version. Matt -- http://mail.python.org/mailman/listinfo/python-list
-- http://mail.python.org/mailman/listinfo/python-list