Re: Further diff optimization ideas

Branko Čibej Tue, 15 Feb 2011 00:56:07 -0800

On 15.02.2011 01:42, Johan Corveleyn wrote:
> 2) Faster hash function for hashing lines. [ Quick win ]
>
> - Currently, we use adler32 (after the line has been normalized).
>
> - GNU diff uses some simple bit shifting scheme, which seems to be a
> lot faster (maybe it has more hash-collisions, but that doesn't seem
> to matter that much).


Is it (x << 5) + x? If so, that's the (in)famous "times 33" hash
function, also see the essay in apr_hashfunc_default in
http://svn.apache.org/viewvc/apr/apr/trunk/tables/apr_hash.c?view=markup

I prefer APR's implementation, because a reasonably good compiler these
days /should/ be able to optimize (x * 33) to ((x << 5) + x), depending
on the characteristics of the target architecture.

In other news, why do we need our own hash function anyway? Or our own
table or tree or whatever? NIH syndrome?

-- Brane

Re: Further diff optimization ideas

Reply via email to