On Wed, Aug 11, 2021 at 8:51 AM John Naylor <john.nay...@enterprisedb.com> wrote: > (Standard disclaimer that I'm not qualified to design index AMs) I've seen > one mention in the literature about the possibility of simply having a btree > index over the hash values. That would require faster search within pages, in > particular using abbreviated keys in the ItemId array of internal pages [1] > and interpolated search rather than pure binary search (which should work > reliably with high-entropy keys like hash values), but doing that would speed > up all btree indexes, so that much is worth doing regardless of how hash > indexes are implemented. In that scheme, the hash index AM would just be > around for backward compatibility.
I think that it's possible (though hard) to do that without involving hashing, even for datatypes like text. Having some kind of prefix compression that makes the final abbreviated keys have high entropy would be essential, though. I agree that it would probably be significantly easier when you knew you were dealing with hash values, but even there you need some kind of prefix compression. In any case I suspect that it would make sense to reimplement hash indexes as a translation layer between hash index opclasses and nbtree. Robert said "Likewise, the fact that keys are stored in hash value order within pages but that the bucket as a whole is not kept in order seems like it's bad for search performance". Obviously we've already done a lot of work on an index AM that deals with a fully ordered keyspace very well. This includes dealing with large groups of duplicates gracefully, since in a certain sense there are no duplicate B-Tree index tuples -- the heap TID tiebreaker ensures that. And it ensures that you have heap-wise locality within these large groups, which is a key enabler of things like opportunistic index deletion. When hash indexes have been used in database systems, it tends to be in-memory database systems where the recovery path doesn't recover indexes -- they're just rebuilt from scratch instead. If that's already a baked-in assumption then hash indexes make more sense. To me it seems like the problem with true hash indexes is that they're constructed in a top-down fashion, which is approximately the opposite of the bottom-up, incremental approach used by B-Tree indexing. This seems to be where all the skew problems arise from. This approach cannot be robust to changes in the keyspace over time, really. -- Peter Geoghegan