Re: Using svn_hash__make instead of apr_hash__make

Stefan Fuhrmann Mon, 04 Jun 2012 03:16:05 -0700

Justin Erenkrantz wrote:

On Tue, May 22, 2012 at 12:35 PM, Stefan Fuhrmann<[email protected]>  wrote:

On a more general note: We don't use hashes as a means to
randomize our data. For us, they are simply containers with
an average O(1) insertion and lookup behavior. The APR interface
also allows for iterating that container - so it *has* an ordering
and it has been quite stable under different operations and
over many versions of APR.

Iterating isn't the same as ordering.


You are right. I got carried away ...

And, the third point doesn't make any sense to me without a further
explanation.  -- justin

When we e.g. do an "svn ls -v" (TSVN repo browser), we will
create and fill the revprop hash for the respective revision
multiple times for each entry in the directory - just to produce
a few bytes of output. The hash function showed up in profiles
with 10% of the total runtime.

Sorry - I'm not following how having a sorted hash table achieves
this; it sounds like reuse of a data structure (not requiring memory
allocs or something) - not that it is guaranteed to be stable on every
call.


This is not linked to the "stable order" part but to performance
(hashfunc_compatible is a bit faster than the APR default).

So, I tuned that. Because apr_hash_t is so popular in our code,
that very localized improvement will give (small) returns in
improved performance all over the place.

Yes, but there may very well be penalties in other places that you
haven't looked at yet.  I know that I've seen shift+incr appear in
traces before - switching to a much more expensive hash algorithm is
going to make those cases worse.  -- justin

I'm not switching to a much more expensive hash algorithm
(if that is what you are implying). And I'm not suggesting that
APR switched to a different function either.

However, I think that APR could detect the "huge bucket"
case with very little overhead - just check / update a counter
on the respective bucket. Once it detected the critical scenario,
it should switch that the data structure for the respective
bucket from a linked list to e.g. a b-tree.

-- Stefan^2.

Re: Using svn_hash__make instead of apr_hash__make

Reply via email to