Re: [PATCH] hash: declare some functions with the warn_unused_result attribute

Eric Blake Tue, 16 Jun 2009 09:13:47 -0700

Jim Meyering <jim <at> meyering.net> writes:

> Long ago (like 12-15 years), in an application that made very heavy use
> of hash tables, the obstack code provided a large performance benefit.


Indeed, if you are frequently calling allocate_entry, the cost of malloc'ing 
lots of small objects is measurably worse than using an obstack to malloc large 
chunks and carve off entries as needed.  In other words, it's not the speedup 
from freeing multiple entries, so much as the speedup from dealing with 
collisions when inserting them along with a reduction in memory usage, that 
makes obstack-managed entry allocation attractive even with modern heap 
management.

For that matter, my patch to hash_rehash penalizes the USE_OBSTACK case by 
calling obstack_alloc and updating the free_entry_list once per needed free 
entry, where it could instead use obstack_grow to guarantee enough space 
without bothering with free_entry_list, so if we spend time improving 
USE_OBSTACK, I will work on a followup to provide this optimization.

> So I'm reluctant to remove it altogether.  However, it's been so long
> since I last tested it that it may well have suffered from bit rot.
> And with modern heap management code, it may not be useful anymore.

My code inspection didn't turn up obvious problems, but I have not yet tested 
with USE_OBSTACK either, so I wouldn't be surprised to find bit rot.  If we do 
get it working, would it be worth making the use of obstacks a runtime decision 
via hash_tuning, rather than a compile-time decision?

There might also be alternative implementations possible with better 
performance.  For example, instead of malloc'ing free entries one at a time and 
tracking bucket overflows as pointers to malloc'd objects, we could instead 
malloc/realloc an array of overflows, with bucket overflows tracked as indices 
into that array.  This would provide some of the same benefits as obstack 
allocation (no per-entry malloc overhead, and fewer calls to malloc and with 
larger chunks of malloc claimed per call), such that it is no longer necessary 
to use an obstack for free-entry management in the first place.

Another hash implementation I am familiar with treats the slots array as a 
circular array, and has no overflow linked lists (thus, the entire hash table 
is contained within a single malloc'd block, rather than the hash.c 
implementation of one block for bucket heads and other blocks for collisions).  
Instead, the hasher merely determines where in the circular array to start 
searching, and on collisions, you end up searching until you hit a free slot.  
But this has other ramifications; searching for a match means traversing 
through the array until you encounter an empty slot, and you no longer have the 
guarantee that the comparison function will always be called with two elements 
with the same hash.  Also, whereas linked lists only search the set of 
collisions, a nearly full circular array may end up searching the bulk of the 
table before it arrives at the next free entry.  However, I have never 
benchmarked this implementation to see how it fares when compared with the more 
traditional bucket of linked lists.

We may also want to borrow from Bruno's idioms in the gl_list arena, where 
hash_initialize is implemented as setting up a vtable optimized to an 
appropriate implementation, such that you could then experiment with different 
implementations by swapping the vtable at initialization time.

> 
> I've begun writing a test module, and may even find the time to test
> the USE_OBSTACK code, too.  As you say, it also needs to be improved
> to handle allocation failure, probably the same way remove.c now does:
> 
>   http://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=a5111af33ea6a5d27

Unrelated to this topic, but in looking at that patch, I noticed you added the 
following line:

 cleanup:;

Any reason for the empty statement after the label?  Is this worth a syntax 
check in maint.mk?

-- 
Eric Blake

Re: [PATCH] hash: declare some functions with the warn_unused_result attribute

Reply via email to