On 15.02.2011 01:42, Johan Corveleyn wrote: > 2) Faster hash function for hashing lines. [ Quick win ] > > - Currently, we use adler32 (after the line has been normalized). > > - GNU diff uses some simple bit shifting scheme, which seems to be a > lot faster (maybe it has more hash-collisions, but that doesn't seem > to matter that much).
Is it (x << 5) + x? If so, that's the (in)famous "times 33" hash function, also see the essay in apr_hashfunc_default in http://svn.apache.org/viewvc/apr/apr/trunk/tables/apr_hash.c?view=markup I prefer APR's implementation, because a reasonably good compiler these days /should/ be able to optimize (x * 33) to ((x << 5) + x), depending on the characteristics of the target architecture. In other news, why do we need our own hash function anyway? Or our own table or tree or whatever? NIH syndrome? -- Brane