On Tue, 22 Jun 2021 at 03:43, Tom Lane <t...@sss.pgh.pa.us> wrote: > I kind of wonder if we really need four different hash table > implementations (this being the third "generic" one, plus hash join > has its own, and I may have forgotten others). Should we instead > think about revising simplehash to gain the benefits of this patch?
hmm, yeah. I definitely agree with trying to have as much reusable code as we can when we can. It certainly reduces maintenance and bugs tend to be found more quickly too. It's a very worthy cause. I did happen to think of this when I was copying swathes of code out of simplehash.h. However, I decided that the two implementations are sufficiently different that if I tried to merge them both into one .h file, we'd have some unreadable and unmaintainable mess. I just don't think their DNA is compatible enough for the two to be mated successfully. For example, simplehash uses open addressing and generichash does not. This means that things like iterating over the table works completely differently. Lookups in generichash need to perform an extra step to fetch the actual data from the segment arrays. I think it would certainly be possible to merge the two, but I just don't think it would be easy code to work on if we did that. The good thing is that that the API between the two is very similar and it's quite easy to swap one for the other. I did make changes around memory allocation due to me being too cheap to zero memory when I didn't need to and simplehash not having any means of allocating memory without zeroing it. I also think that there's just no one-size-fits-all hash table type. simplehash will not perform well when the size of the stored element is very large. There's simply too much memcpying to move data around during insert/delete. simplehash will also have terrible iteration performance in sparsely populated tables. However, simplehash will be pretty much unbeatable for lookups where the element type is very small, e.g single Datum, or an int. The CPU cache efficiency there will be pretty much unbeatable. I tried to document the advantages of each in the file header comments. I should probably also add something to simplehash.h's comments to mention generichash.h David