Hi, Darshit Shah wrote: > I recently tried to use the hash table implementation in gnulib which resides > in the "hash" module. However, I quickly realised that the hash table in > gnulib > seems to be what is otherwise popularly known as a hash set, i.e., it supports > storing and retrieving just values from the structure. > > On the other hand, a hash table is usually expected to have a key->value > mapping that is stored.
I agree that the gnulib 'hash' module is just a particular case, and probably the module name is not very descriptive. > Within GNU Wget, we have a fairly portable version of a hash table implemented > which I think would be a good addition for gnulib. What do you think? There's not only the one from wget https://git.savannah.gnu.org/gitweb/?p=wget.git;a=blob;f=src/hash.h https://git.savannah.gnu.org/gitweb/?p=wget.git;a=blob;f=src/hash.c but also the one from gettext https://git.savannah.gnu.org/gitweb/?p=gettext.git;a=blob;f=gnulib-local/lib/hash.h https://git.savannah.gnu.org/gitweb/?p=gettext.git;a=blob;f=gnulib-local/lib/hash.c and the one from glib https://gitlab.gnome.org/GNOME/glib/blob/master/glib/ghash.h https://gitlab.gnome.org/GNOME/glib/blob/master/glib/ghash.c and the one from libxml https://gitlab.gnome.org/GNOME/libxml2/blob/master/include/libxml/hash.h https://gitlab.gnome.org/GNOME/libxml2/blob/master/hash.c and the ones from CLN https://www.ginac.de/CLN/cln.git/?p=cln.git;a=tree;f=src/base/hash and many more. The implementation you are proposing is an "open-addressed table with linear probing collision resolution". To me that is unacceptable. When I used Kyoto Common Lisp (KCL) many years ago, I got an endless loop during a hash table access, and it was precisely because of this open-addressed table structure. I don't want a code which requires careful setting of parameters in order not to run into an endless loop. Instead better have a code that cannot run into an endless loop *by design*. The hash_string function that you propose shifts by 5 bits at each step; I suspect that it has the same problem as the one I tested and discussed in https://haible.de/bruno/hashfunc.html . For Gnulib, I would want a generic container, i.e. a "map", like we have "list" and "ordered set" already (modules 'list' and 'oset'). Other GNU maintainers have reported that they like this approach. However, this will still not fit all possible needs because there are special cases that people want to see optimized: - The case when the key is a string; additionally when the key is allocated in an obstack and there is no remove. - The struniq function (as in localename.c). Then, what about extra requirements? - The existing gnulib 'hash' module is pretty unique: it keeps statistics. But is anyone really using this feature? - malloc vs. xmalloc. - Multithread-safety should IMO not be considered as an extra requirement. This is better done in application logic, because typically in the scope of the lock the application will do more than just the hash table lookup. Bruno