On Tue, Dec 10, 2019 at 4:59 PM Andres Freund <and...@anarazel.de> wrote: > 3) For lots of one-off uses of hashtables that aren't performance > critical, we want a *simple* API. That IMO would mean that key/value > end up being separately allocated pointers, and that just a > comparator is provided when creating the hashtable.
I think the simplicity of the API is a key point. Some things that are bothersome about dynahash: - It knows about memory contexts and insists on having its own. - You can't just use a hash table in shared memory; you have to "attach" to it first and have an object in backend-private memory. - The usual way of getting a shared hash table is ShmemInitHash(), but that means that the hash table has its own named chunk and that it's in the main shared memory segment. If you want to put it inside another chunk or put it in DSM or whatever, it doesn't work. - It knows about LWLocks and if it's a shared table it needs its own tranche of them. - hash_search() is hard to wrap your head around. One thing I dislike about simplehash is that the #define-based interface is somewhat hard to use. It's not that it's a bad design. It's just you have to sit down and think for a while to figure out which things you need to #define in order to get it to do what you want. I'm not sure that's something that can or needs to be fixed, but it's something to consider. Even dynahash, as annoying as it is, is in some ways easier to get up and running. Probably the two most common uses cases are: (1) a fixed-sized shared memory hash table of fixed-size entries where the key is the first N bytes of the entry and it never grows, or (2) a backend-private or perhaps frontend hash table of fixed-size entries where the key is the first N bytes of the entry, and it grows without limit. I think should consider having specialized APIs for those two cases and then more general APIs that you can use when that's not enough. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company