> I was trying to see how some have implemented a hashtable. I took a gather > at dictobject.h/.c. It seems that underneath it all it's a linked list and > that is used in order to store the actual information (I'm looking at > PyDictEntry.) > > Am I correct in my assumption or is there more to this? I'm still looking > into how new entries are handled.
The Python dictionary implementation has been pretty well optimized over the years, so it might not be the most easy-to-read code. You might actually try and latch onto a copy of dictobject.[ch] from a very old version of Python (1.5-ish). That will have much less in the way of speed tricks obfuscating the code. Very generally (I'm writing with a lot of water under the bridge since I last thought about this), a dictionary contains an array whose length is typically a power of two (2**n). When considering a key for insertion or lookup, a hash value is computed, and the last n bits of the resulting value are used as an index into that array. Each element of the array consists of a linked list of all the key/value pairs whose keys hash to that value. Once you've identified an element in the hash array, the linked list is traversed looking for the key. There are three operations: get, set, delete. Each operation has one of two actions to perform depending whether the key is found or not: * get - if found, return the corresponding value, if not, raise KeyError * set - if found, replace the old value with the new, if not, add a new key/value pair to the list * del if found, delete the key/value pair, if not, raise KeyError The challenge is to come up with a reasonable size hash array and a suitable hash function which balances the desire to not chew up all of memory with the desire to have very short lists. In Python's case, I believe that dictionaries with strings as keys (and the hash function used for strings) have been optimized for how Python's runtime works. Skip -- https://mail.python.org/mailman/listinfo/python-list