On Sun, Jun 2, 2019 at 4:07 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > Anyway, I said to Jeremy in the hallway that it might not be that > hard to bolt IOS support onto GIN for cases where the opclass is > a non-subdividing one, but after looking at the code I'm less sure > about that. GIN hasn't even got an "amgettuple" code path, just > "amgetbitmap", and a big part of the reason why is the need to merge > results from the fastupdate pending list with results from the main > index area. Not sure how we could deal with that.
I suspect that GIN also avoids several other problems by only offer an "amgetbitmap", and not an "amgettuple". For example, it doesn't have to worry about things like numeric display scale, where a datum is substantively different to another datum, while still being equal according to opclass semantics (this is an example that I end up citing in many discussions about indexing). I bet that there are a few more of those beyond those two, that I haven't thought about. > Anyway, the larger point here is that right now btree_gin is just a quick > hack, and it seems like it might be worth putting some more effort into > it, because the addition of duplicate-compression changes the calculus > for whether it's useful. There was also discussion about making nbtree support deduplication during the hallway track. Jim Finnerty reminded me that there is a patch from Anastasia that did deduplication in nbtree that didn't go anywhere. Heikki independently talked about the possibility that he would work on this project in the next release, without being prompted by me. I think that the fact that nbtree sorts duplicate entries in heap TID order these days makes that worth looking into again. We can use something like GIN's varbyte encoding process to compress duplicates effectively. A lot of the problems (the numeric display scale problem, pg_upgrade) can be avoided by defining deduplication as something that happens on a best-effort basis. This is not the case within GIN, where it's impossible for the main entry tree to have duplicates without at least storing them in a posting list. -- Peter Geoghegan