Re: Emit fewer vacuum records by reaping removable tuples during pruning

Peter Geoghegan Sat, 06 Jan 2024 08:03:52 -0800

On Fri, Jan 5, 2024 at 12:23 PM Robert Haas <robertmh...@gmail.com> wrote:
> > As I think we chatted about before, I eventually would like the option to
> > remove index entries for a tuple during on-access pruning, for OLTP
> > workloads. I.e. before removing the tuple, construct the corresponding index
> > tuple, use it to look up index entries pointing to the tuple. If all the 
> > index
> > entries were found (they might not be, if they already were marked dead 
> > during
> > a lookup, or if an expression wasn't actually immutable), we can prune 
> > without
> > the full index scan.  Obviously this would only be suitable for some
> > workloads, but it could be quite beneficial when you have huge indexes.  The
> > reason I mention this is that then we'd have another source of marking items
> > unused during pruning.
>
> I will be astonished if you can make this work well enough to avoid
> huge regressions in plausible cases. There are plenty of cases where
> we do a very thorough job opportunistically removing index tuples.


Right. In particular, bottom-up index deletion works well because it
adds a kind of natural backpressure to one important special case (the
case of non-HOT updates that don't "logically change" any indexed
column). It isn't really all that "opportunistic" in my understanding
of the term -- the overall effect is to *systematically* control bloat
in a way that is actually quite reliable. Like you, I have my doubts
that it would be valuable to be more proactive about deleting dead
index tuples that are just random dead tuples. There may be a great
many dead index tuples diffusely spread across an index -- these can
be quite harmless, and not worth proactively cleaning up (even at a
fairly low cost). What we mostly need to worry about is *concentrated*
build-up of dead index tuples in particular leaf pages.

A natural question to ask is: what cases remain, where we could stand
to add more backpressure? What other "special case" do we not yet
address? I think that retail index tuple deletion could work well as
part of a limited form of "transaction rollback" that cleans up after
a just-aborted transaction, within the backend that executed the
transaction itself. I suspect that this case now has outsized
importance, precisely because it's the one remaining case where the
system accumulates index bloat without any sort of natural
backpressure. Making the transaction/backend that creates bloat
directly responsible for proactively cleaning it up tends to have a
stabilizing effect over time. The system is made to live within its
means.

We could even fully reverse heap page line pointer bloat under this
"transaction rollback" scheme -- I bet that aborted xacts are a
disproportionate source of line pointer bloat. Barring a hard crash,
or a very large transaction, we could "undo" the physical changes to
relations before permitting the backend to retry the transaction from
scratch. This would just work as an optimization.

--
Peter Geoghegan

Re: Emit fewer vacuum records by reaping removable tuples during pruning

Reply via email to