On Tue, Jun 22, 2021 at 1:55 PM David Rowley <dgrowle...@gmail.com> wrote: > On Tue, 22 Jun 2021 at 03:43, Tom Lane <t...@sss.pgh.pa.us> wrote: > > I kind of wonder if we really need four different hash table > > implementations (this being the third "generic" one, plus hash join > > has its own, and I may have forgotten others). Should we instead > > think about revising simplehash to gain the benefits of this patch? > > hmm, yeah. I definitely agree with trying to have as much reusable > code as we can when we can. It certainly reduces maintenance and bugs > tend to be found more quickly too. It's a very worthy cause.
It is indeed really hard to decide when you have a new thing, and when you need a new way to parameterise the existing generic thing. I wondered about this how-many-hash-tables-does-it-take question a lot when writing dshash.c (a chaining hash table that can live in weird dsa.c memory, backed by DSM segments created on the fly that may be mapped at different addresses in each backend, and has dynahash-style partition locking), and this was around the time Andres was talking about simplehash. In retrospect, I'd probably kick out the built-in locking and partitions and instead let callers create their own partitioning scheme on top from N tables, and that'd remove one quirk, leaving only the freaky pointers and allocator. I recall from a previous life that Boost's unordered_map template is smart enough to support running in shared memory mapped at different addresses just through parameterisation that controls the way it deals with internal pointers (boost::unordered_map<..., ShmemAllocator>), which seemed pretty clever to me, and it might be achievable to do the same with a generic hash table for us that could take over dshash's specialness. One idea I had at the time is that the right number of hash table implementations in our tree is two: one for chaining (like dynahash) and one for open addressing/probing (like simplehash), and that everything else should be hoisted out (locking, partitioning) or made into template parameters through the generic programming technique that simplehash.h has demonstrated (allocators, magic pointer type for internal pointers, plus of course the inlinable ops). But that was before we'd really fully adopted the idea of this style of template code. (I also assumed the weird memory stuff would be temporary and we'd move to threads, but that's another topic for another thread.) It seems like you'd disagree with this, and you'd say the right number is three. But it's also possible to argue for one... A more superficial comment: I don't like calling hash tables "hash". I blame perl.