On Tue, Jul 17, 2018 at 5:45 PM, Robert Haas <robertmh...@gmail.com> wrote: > Yes, that's a good point. Also, and I think pretty importantly, this > seems essential if we want to allow retail index tuple deletion, which > has its own set of advantages.
Retail index tuple deletion is clearly something we should have in some form, and is the top reason to pursue the project. I could probably come up with dozens of other reasons if I had to, though. Here are a few of the less obvious ones: * This improves usage_count of heap page buffers automatically, by recognizing correlated references when there are many duplicates. [1] * How else is VACUUM/deletion supposed to work with indirect indexes? I think that it's impossible without either retail deletion/retail updates, unless an AEL is okay. * Global indexes (indexes that simultaneously cover multiple partitions) are just a generalization of this idea, with a partition number attribute before the heap TID attribute. * The sort order of duplicates actually matters quite a lot to the executor, and to the planner, which this lets us reason about in many cases. [2][3] * There is nothing to stop us from exploiting this within tidbitmap.c/bitmap index scans. We can build disjunct lists of sorted TIDs, one per value. Merging lists like this together is typically very cheap, even when there are relatively many distinct lists. We could probably do that on-the-fly. * When building a bitmap for a low cardinality value from a secondary index, why bother even visiting the leaf pages? We can make a lossy bitmap from a subset of the internal pages. * Techniques like bitmap semi-join, which are important for star-schema queries, are dependent on the merging and intersection of fact table bitmaps being cheap (basically, you build several bitmaps through nestloop joins with some dimension table on the outer side, and some fact table index on the inner side -- that's relatively easy to parallelize well). Tom's "Improve OR conditions on joined columns" patch sorts a list of TIDs [4], and I suspect that there is some high-level connection there. * Heap prefetching for nbtree index scans is a lot more effective than it might otherwise be. I really could go on, but you get the idea. The merits of each of these ideas are debatable, but the fact that there are so many ideas that are at least plausible-sounding seems significant to me. > I don't think you're going to be able to rip out the getting-tired > code, though, because presumably we'll have to continue support > existing btree indexes that don't include TIDs for some > probably-not-small number of future releases, even if we decide that > all future btree indexes (and any that are reindexed) should have > TIDs. That's definitely true. There is no way that it's okay to ignore this issue with pg_upgrade. [1] https://postgr.es/m/cah2-wzkrkkw88yq4-bln5n3drfv93p8r+ti-kfbter08p8c...@mail.gmail.com [2] https://postgr.es/m/20170707234119.gn17...@telsasoft.com [3] https://postgr.es/m/520D6610.8040907%40emulex.com#520d6610.8040...@emulex.com [4] https://postgr.es/m/22002.1487099934%40sss.pgh.pa.us -- Peter Geoghegan