On Fri, Jan 5, 2024 at 12:23 PM Robert Haas <robertmh...@gmail.com> wrote: > > As I think we chatted about before, I eventually would like the option to > > remove index entries for a tuple during on-access pruning, for OLTP > > workloads. I.e. before removing the tuple, construct the corresponding index > > tuple, use it to look up index entries pointing to the tuple. If all the > > index > > entries were found (they might not be, if they already were marked dead > > during > > a lookup, or if an expression wasn't actually immutable), we can prune > > without > > the full index scan. Obviously this would only be suitable for some > > workloads, but it could be quite beneficial when you have huge indexes. The > > reason I mention this is that then we'd have another source of marking items > > unused during pruning. > > I will be astonished if you can make this work well enough to avoid > huge regressions in plausible cases. There are plenty of cases where > we do a very thorough job opportunistically removing index tuples.
Right. In particular, bottom-up index deletion works well because it adds a kind of natural backpressure to one important special case (the case of non-HOT updates that don't "logically change" any indexed column). It isn't really all that "opportunistic" in my understanding of the term -- the overall effect is to *systematically* control bloat in a way that is actually quite reliable. Like you, I have my doubts that it would be valuable to be more proactive about deleting dead index tuples that are just random dead tuples. There may be a great many dead index tuples diffusely spread across an index -- these can be quite harmless, and not worth proactively cleaning up (even at a fairly low cost). What we mostly need to worry about is *concentrated* build-up of dead index tuples in particular leaf pages. A natural question to ask is: what cases remain, where we could stand to add more backpressure? What other "special case" do we not yet address? I think that retail index tuple deletion could work well as part of a limited form of "transaction rollback" that cleans up after a just-aborted transaction, within the backend that executed the transaction itself. I suspect that this case now has outsized importance, precisely because it's the one remaining case where the system accumulates index bloat without any sort of natural backpressure. Making the transaction/backend that creates bloat directly responsible for proactively cleaning it up tends to have a stabilizing effect over time. The system is made to live within its means. We could even fully reverse heap page line pointer bloat under this "transaction rollback" scheme -- I bet that aborted xacts are a disproportionate source of line pointer bloat. Barring a hard crash, or a very large transaction, we could "undo" the physical changes to relations before permitting the backend to retry the transaction from scratch. This would just work as an optimization. -- Peter Geoghegan