On Wed, Nov 23, 2016 at 6:01 PM, Peter Geoghegan <p...@heroku.com> wrote: > * Our behavior with many duplicates in secondary indexes is pretty bad > in general, I suspect.
>From the pie-in-the-sky department: I believe there are snapshot-based systems that don't ever have more than one entry for a logical row in a btree. Instead, they do UNDO-log based snapshot reconstruction for btrees too, generalising what is being discussed for heap tables in this thread. That means that whenever you descend a btree, at each page you have to be prepared to go and look in the UNDO log if necessary on a page-by-page basis. Alternatively, some systems use a shadow table rather than an UNDO log for the old entries, but still privilege queries that need the latest version by keeping only those in the btree. I don't know the details but suspect that both of those approaches might require CSN (= LSN?) based snapshots, so that you can make page-level visibility determinations while traversing the 'current' btree based on a page header. That would reduce both kinds of btree bloat: the primary kind of being entries for dead tuples that still exist in the heap, and the secondary kind being the resultant permanent btree fragmentation that remains even after vacuum removes them. Presumably once you have all that, you can allow index-only-scans without a visibility map. Also, I suspect that such a "3D time travelling" btree would be a requirement for index ordered tables in a snapshot-based RDBMS. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers