On Wed, Aug 4, 2010 at 2:22 PM, John Yates <j...@yates-sheets.org> wrote: > Here are two useful references: > > http://bretm.home.comcast.net/hash/ > http://burtleburtle.net/bob/hash/ > > RE computation cost: Generalized integer multiplication and integer > division both used to be expensive. On modern processors a > multiplication is fully pipelined and comparable to an L1 data cache > hit (~2 to 4 cycles). Most modern implementations of division run the > divide circuit at a multiple greater than 1 of the cpu clock > (typically 2x) and implement early out. Further division is nearly > always an autonomous, non-pipelined logic block that once initiated > runs asynchronously until it is ready to deliver its result. Bottom > line: on any contemporary out-of-order x86 chip from either Intel or > AMD the cost of either of these instruction is negligible.
I'm not a great expert of microarchitecture, but isn't the fact the implied division is the last "explicit" operation in a function, with nothing much after it to overlap with, going to serialise things thought? (The fact that I'm currently developing for Intel Atoms and ARM chips, both in-order chips, probably does skew my ideas on performance though. An instruction reference claims that it's latency is 50 instructions on an Atom, which is why I try and avoid unnecessary divisions.) > bucket selector. Nearly universally this is a modulo operation taking > the remainder of the digest divided by the number of buckets. A > particularly poor approach is to have a power of 2 number of buckets > so as to to reduce the modulo operation to simple masking. The reason > is that this discards all entropy in all masked out bits. A _much_ > better approach is to pick the number of buckets to be a prime number > -- the remainder of division by a prime number will be influenced by > _every_ bit in the digest. (When I need to implement a dynamic hash > table I always include a statically initialized array of primes chosen > to give me the growth pattern I desire.) If your hash function has appropriate avalanching and funneling so that you believe it is a good approximation to uniform, then there's no reason to believe that the entropy obtained by a using 2^n bits should be any better or worse than reducing modulo a prime. (Certainly if you look at most of the hash tables on Linux they're powers of 2, but I don't know if that means much.) That said, I haven't actually timed a full hash-table implementation to see if an integer division operation shows up on an optimised program using hash tables though. My bigger point was just that lots of the code snippets in K & R was written for a different time and is inappropriate in 2010. -- cheers, dave tweed__________________________ computer vision reasearcher: david.tw...@gmail.com "while having code so boring anyone can maintain it, use Python." -- attempted insult seen on slashdot