Re: Emit fewer vacuum records by reaping removable tuples during pruning

Peter Geoghegan Sat, 06 Jan 2024 08:35:05 -0800

On Fri, Jan 5, 2024 at 12:57 PM Andres Freund <[email protected]> wrote:
> > I will be astonished if you can make this work well enough to avoid
> > huge regressions in plausible cases. There are plenty of cases where
> > we do a very thorough job opportunistically removing index tuples.
>
> These days the AM is often involved with that, via
> table_index_delete_tuples()/heap_index_delete_tuples(). That IIRC has to
> happen before physically removing the already-marked-killed index entries. We
> can't rely on being able to actually prune the heap page at that point, there
> might be other backends pinning it, but often we will be able to. If we were
> to prune below heap_index_delete_tuples(), we wouldn't need to recheck that
> index again during "individual tuple pruning", if the to-be-marked-unused heap
> tuple is one of the tuples passed to heap_index_delete_tuples(). Which
> presumably will be very commonly the case.


I don't understand. Making heap_index_delete_tuples() prune heap pages
in passing such that we can ultimately mark dead heap tuples LP_UNUSED
necessitates high level coordination -- it has to happen at a level
much higher than heap_index_delete_tuples(). In other words, making it
all work safely requires the same high level context that makes it
safe for VACUUM to set a stub LP_DEAD line pointer to LP_UNUSED (index
tuples must never be allowed to point to TIDs/heap line pointers that
can be concurrently recycled).

Obviously my idea of "a limited form of transaction rollback" has the
required high-level context available, which is the crucial factor
that allows it to safely reverse all bloat -- even line pointer bloat
(which is traditionally something that only VACUUM can do safely). I
have a hard time imagining a scheme that can do that outside of VACUUM
without directly targeting some special case, such as the case that
I'm calling "transaction rollback". In other words, I have a hard time
imagining how this would ever be practical as part of any truly
opportunistic cleanup process. AFAICT the dependency between indexes
and the heap is just too delicate for such a scheme to ever really be
practical.

> At least for nbtree, we are much more aggressive about marking index entries
> as killed, than about actually removing the index entries. "individual tuple
> pruning" would have to look for killed-but-still-present index entries, not
> just for "live" entries.

These days having index tuples directly marked LP_DEAD is surprisingly
unimportant to heap_index_delete_tuples(). The batching optimization
implemented by _bt_simpledel_pass() tends to be very effective in
practice. We only need to have the right *general* idea about which
heap pages to visit -- which heap pages will yield some number of
deletable index tuples.


--
Peter Geoghegan

Re: Emit fewer vacuum records by reaping removable tuples during pruning

Reply via email to