Jim Meyering <jim <at> meyering.net> writes: > Unless I become exceptionally unresponsive, > please post before pushing changes to hash.c. > > I would have requested that you not mix clean-up like this with > a semantics-changing change set:
Sorry for jumping the gun on that. I'll be a little more patient for the rest of my series, then. > > + hash: provide default callback functions > > + * lib/hash.c (raw_hasher, raw_comparator): New functions. > > + (hash_initialize): Use them as defaults. > > + * tests/test-hash.c (main): Test this. > > This looks fine, too. > Yes, I've often hashed pointer values. > However, note that the low bits of a pointer are often zero, > so that for small table sizes, this may be a very poor choice. Agreed (and these days, pointers are usually aligned to at least 8 bytes, not just 4). > size_t > hash_address (const void *addr, size_t table_size) > { > size_t k = (size_t) addr; > assert (k % 4 == 0); > return (k / 4) % table_size; > } That discards bits, in the case where non-malloced values are being used. The assertion is nice when you know that all values are malloc'd, but is harsh for a default. So, how about I squash this slight modification into my patch before pushing this commit? diff --git i/lib/hash.c w/lib/hash.c index 52a211e..c9866c5 100644 --- i/lib/hash.c +++ w/lib/hash.c @@ -483,7 +483,14 @@ hash_reset_tuning (Hash_tuning *tuning) static size_t raw_hasher (const void *data, size_t n) { - return (size_t) data % n; + /* When hashing unique pointers, it is often the case that they were + generated by malloc and thus have the property that the low-order + bits are 0. As this tends to give poorer performance with small + tables, we rotate the pointer value before performing division, + in an attempt to improve hash quality. */ + size_t val = data; + val = (val >> 3) | (val << (CHAR_BIT * sizeof val - 3)); + return val % n; } /* If the user passes a NULL comparator, we use pointer comparison. */