On 2018-12-27 14:22:11 -0500, Tom Lane wrote: > Andrew Dunstan <andrew.duns...@2ndquadrant.com> writes: > > A smaller project might be to see if we can replace the binary keyword > > search in ScanKeyword with a perfect hashing function generated by > > gperf, or something similar. I had a quick look at that, too. > > Yeah, we've looked at gperf before, eg > > https://www.postgresql.org/message-id/20170927183156.jqzcsy7ocjcbd...@alap3.anarazel.de > > Perhaps it'd be a win but I'm not very convinced.
Note that the tradeoffs mentioned there, by memory, aren't necessarily applicable here. As we're dealing with strings anyway, gperf wanting to deal with strings rather than being able to deal with numbers isn't problematic. > I don't know much about the theory of perfect hashing, but I wonder > if we could just roll our own tool for that. Since we're not dealing > with extremely large keyword sets, perhaps brute force search for a > set of multipliers for a hash computation like > (char[0] * some_prime + char[1] * some_other_prime ...) mod table_size > would work. The usual way to do do perfect hashing is to bascially have a two stage hashtable, with the first stage keyed by a "normal" hash fuinction, and the second one disambiguating the values that hash into the same bucket, by additionally keying a hash-function with the value in the cell in the intermediate hash table. Determining the parameters in the intermediate table is what takes time. That most perfect hash functions look like that way is also a good part of the reason why I doubt it's worthwhile to go there over a simple linear probing hashtable, with a good hashfunction - computing two hash-values will usually be worse than linear probing for *small* and *not modified* hashtables. A simple (i.e. slow for large numbers of keys) implementation for generating a perfect hash function isn't particularly hard. E.g. look at the python implementation at http://iswsa.acm.org/mphf/index.html and http://stevehanov.ca/blog/index.php?id=119 for an easy explanation with graphics. Greetings, Andres Freund