-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 According to Jim Meyering on 6/17/2009 1:21 PM: >> is not quite right. A hashing function may be somewhat ill-conditioned for >> one >> bucket size, but much better conditioned for another size. At any rate, it >> looks like the correct fix is to check for rehashing thresholds based on >> number >> of entries, not number of buckets, and that this check needs to happen on >> every >> insertion, not just insertions that land in an empty bucket. Or, at the very >> least, check for threshold PRIOR to insertion, rather than after, such that >> in >> the above case with m4, the very next call to hash_insert would notice the >> table was at 100%, and force another resize above 907 buckets. >> >> Thoughts, before I implement something along these lines? > > Sounds good.
Hey - I just noticed that this also fixed the bug where hash_insert was inconsistent on NULL return whether an entry had been inserted or not. Now, you are guaranteed that a NULL return means the entry was not inserted. Here's two simple patches. I'm committing the first, as there should be enough cases where you end up searching for an element already known to be hashed, where making an indirect function call is needless overhead (particularly if the comparator function is expensive, and neglected to do the pointer comparison filter itself). And by documenting this behavior, the user's comparator functions can now be safely rewritten to avoid the pointer comparison filter. I haven't pushed the second one to savannah yet; what do you think of it? Sometimes it is desirable to hash distinct pointers, in which case allowing a NULL user function to request this seems to make sense. My design choice was to provide trivial functions at initialization rather than penalize normal users by checking at every use of the callback whether the function was NULL. I've rebased my patch series, although the last couple of patches (my first cut at dealing with hash_rehash allocation failure, and rewriting overflow management to be more efficient) are rather untested at the moment, and might not even compile: http://repo.or.cz/w/gnulib/ericb.git $ git pull git://repo.or.cz/gnulib/ericb.git master - -- Don't work too hard, make some time for fun as well! Eric Blake e...@byu.net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAko6QU8ACgkQ84KuGfSFAYDwkACfZUk5ZbgoMc96pkGNCOdkhMFE YE8AoKbbrBBMYhrdBcX+DAt9SQXFZE7K =FWag -----END PGP SIGNATURE-----
>From 16908fc83e83a08e4f0e8c004141ff01ae12ca86 Mon Sep 17 00:00:00 2001 From: Eric Blake <e...@byu.net> Date: Thu, 18 Jun 2009 06:56:13 -0600 Subject: [PATCH 1/2] hash: minor optimization * lib/hash.c (hash_lookup, hash_find_entry): Avoid function call when possible. (hash_initialize): Document this promise. (hash_do_for_each, hash_clear, hash_free): Use C89 syntax. * tests/test-hash.c (hash_compare_strings): Test this. Signed-off-by: Eric Blake <e...@byu.net> --- ChangeLog | 9 +++++++++ lib/hash.c | 20 ++++++++++---------- tests/test-hash.c | 1 + 3 files changed, 20 insertions(+), 10 deletions(-) diff --git a/ChangeLog b/ChangeLog index 1b7cd78..09cc027 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,12 @@ +2009-06-18 Eric Blake <e...@byu.net> + + hash: minor optimization + * lib/hash.c (hash_lookup, hash_find_entry): Avoid function call + when possible. + (hash_initialize): Document this promise. + (hash_do_for_each, hash_clear, hash_free): Use C89 syntax. + * tests/test-hash.c (hash_compare_strings): Test this. + 2009-06-18 Bruno Haible <br...@clisp.org> * m4/strstr.m4 (gl_FUNC_STRSTR): Skip linear time test if strstr is diff --git a/lib/hash.c b/lib/hash.c index 4de1069..e464ce3 100644 --- a/lib/hash.c +++ b/lib/hash.c @@ -263,7 +263,7 @@ hash_lookup (const Hash_table *table, const void *entry) return NULL; for (cursor = bucket; cursor; cursor = cursor->next) - if (table->comparator (entry, cursor->data)) + if (entry == cursor->data || table->comparator (entry, cursor->data)) return cursor->data; return NULL; @@ -373,7 +373,7 @@ hash_do_for_each (const Hash_table *table, Hash_processor processor, { for (cursor = bucket; cursor; cursor = cursor->next) { - if (!(*processor) (cursor->data, processor_data)) + if (! processor (cursor->data, processor_data)) return counter; counter++; } @@ -535,7 +535,8 @@ check_tuning (Hash_table *table) The user-supplied COMPARATOR function should be provided. It accepts two arguments pointing to user data, it then returns true for a pair of entries that compare equal, or false otherwise. This function is internally called - on entries which are already known to hash to the same bucket index. + on entries which are already known to hash to the same bucket index, + but which are distinct pointers. The user-supplied DATA_FREER function, when not NULL, may be later called with the user data as an argument, just before the entry containing the @@ -628,7 +629,7 @@ hash_clear (Hash_table *table) for (cursor = bucket->next; cursor; cursor = next) { if (table->data_freer) - (*table->data_freer) (cursor->data); + table->data_freer (cursor->data); cursor->data = NULL; next = cursor->next; @@ -640,7 +641,7 @@ hash_clear (Hash_table *table) /* Free the bucket head. */ if (table->data_freer) - (*table->data_freer) (bucket->data); + table->data_freer (bucket->data); bucket->data = NULL; bucket->next = NULL; } @@ -670,9 +671,7 @@ hash_free (Hash_table *table) if (bucket->data) { for (cursor = bucket; cursor; cursor = cursor->next) - { - (*table->data_freer) (cursor->data); - } + table->data_freer (cursor->data); } } } @@ -769,7 +768,7 @@ hash_find_entry (Hash_table *table, const void *entry, return NULL; /* See if the entry is the first in the bucket. */ - if ((*table->comparator) (entry, bucket->data)) + if (entry == bucket->data || table->comparator (entry, bucket->data)) { void *data = bucket->data; @@ -796,7 +795,8 @@ hash_find_entry (Hash_table *table, const void *entry, /* Scan the bucket overflow. */ for (cursor = bucket; cursor->next; cursor = cursor->next) { - if ((*table->comparator) (entry, cursor->next->data)) + if (entry == cursor->next->data + || table->comparator (entry, cursor->next->data)) { void *data = cursor->next->data; diff --git a/tests/test-hash.c b/tests/test-hash.c index 2266545..73a3512 100644 --- a/tests/test-hash.c +++ b/tests/test-hash.c @@ -46,6 +46,7 @@ static bool hash_compare_strings (void const *x, void const *y) { + ASSERT (x != y); return STREQ (x, y) ? true : false; } -- 1.6.3.rc3.2.g4b51 >From 3096059ada9eba36cf7bbfcaa6f4d766b0f481a9 Mon Sep 17 00:00:00 2001 From: Eric Blake <e...@byu.net> Date: Thu, 18 Jun 2009 07:11:39 -0600 Subject: [PATCH 2/2] hash: provide default callback functions * lib/hash.c (raw_hasher, raw_comparator): New functions. (hash_initialize): Use them as defaults. * tests/test-hash.c (main): Test this. Signed-off-by: Eric Blake <e...@byu.net> --- ChangeLog | 5 +++++ lib/hash.c | 25 +++++++++++++++++++++---- tests/test-hash.c | 12 ++++++++++++ 3 files changed, 38 insertions(+), 4 deletions(-) diff --git a/ChangeLog b/ChangeLog index 09cc027..6861f6b 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,5 +1,10 @@ 2009-06-18 Eric Blake <e...@byu.net> + hash: provide default callback functions + * lib/hash.c (raw_hasher, raw_comparator): New functions. + (hash_initialize): Use them as defaults. + * tests/test-hash.c (main): Test this. + hash: minor optimization * lib/hash.c (hash_lookup, hash_find_entry): Avoid function call when possible. diff --git a/lib/hash.c b/lib/hash.c index e464ce3..52a211e 100644 --- a/lib/hash.c +++ b/lib/hash.c @@ -479,6 +479,21 @@ hash_reset_tuning (Hash_tuning *tuning) *tuning = default_tuning; } +/* If the user passes a NULL hasher, we hash the raw pointer. */ +static size_t +raw_hasher (const void *data, size_t n) +{ + return (size_t) data % n; +} + +/* If the user passes a NULL comparator, we use pointer comparison. */ +static bool +raw_comparator (const void *a, const void *b) +{ + return a == b; +} + + /* For the given hash TABLE, check the user supplied tuning structure for reasonable values, and return true if there is no gross error with it. Otherwise, definitively reset the TUNING field to some acceptable default @@ -527,12 +542,12 @@ check_tuning (Hash_table *table) provided but the values requested are out of bounds or might cause rounding errors, return NULL. - The user-supplied HASHER function should be provided. It accepts two + The user-supplied HASHER function, when not NULL, accepts two arguments ENTRY and TABLE_SIZE. It computes, by hashing ENTRY contents, a slot number for that entry which should be in the range 0..TABLE_SIZE-1. This slot number is then returned. - The user-supplied COMPARATOR function should be provided. It accepts two + The user-supplied COMPARATOR function, when not NULL, accepts two arguments pointing to user data, it then returns true for a pair of entries that compare equal, or false otherwise. This function is internally called on entries which are already known to hash to the same bucket index, @@ -553,8 +568,10 @@ hash_initialize (size_t candidate, const Hash_tuning *tuning, { Hash_table *table; - if (hasher == NULL || comparator == NULL) - return NULL; + if (hasher == NULL) + hasher = raw_hasher; + if (comparator == NULL) + comparator = raw_comparator; table = malloc (sizeof *table); if (table == NULL) diff --git a/tests/test-hash.c b/tests/test-hash.c index 73a3512..6bb9652 100644 --- a/tests/test-hash.c +++ b/tests/test-hash.c @@ -135,6 +135,18 @@ main (void) hash_clear (ht); ASSERT (hash_get_n_entries (ht) == 0); hash_free (ht); + + /* Test pointer hashing. */ + ht = hash_initialize (sz, NULL, NULL, NULL, NULL); + ASSERT (ht); + { + char *str = xstrdup ("a"); + insert_new (ht, "a"); + insert_new (ht, str); + ASSERT (hash_lookup (ht, str) == str); + free (str); + } + hash_free (ht); } /* Now, each entry is malloc'd. */ -- 1.6.3.rc3.2.g4b51