On Tue, Apr 5, 2022 at 5:44 AM Robert Haas <robertmh...@gmail.com> wrote: > The whole idea of decoupling table and index vacuum > supposes that there are situations in which it's worth performing the > first heap pass where we gather the dead line pointers but where it's > not necessary to follow that up as quickly as possible with a second > heap pass to mark dead line pointers unused. I think Peter and I are > in agreement that there are situations in which some indexes need to > be vacuumed much more often than others -- but that doesn't matter if > the heap needs to be vacuumed more frequently than anything else, > because you can't do that without first vacuuming all the indexes.
It's not just an enabler of more frequent index vacuuming (for those indexes that need it the most), though. It's also an enabler of more frequent lazy_scan_prune processing (in particular setting hint bits and freezing), which is probably even more likely to benefit from the decoupling you'd be enabling. I can imagine this having great value in a world where autovacuum scheduling eagerly keeps up with inserts into an append-mostly table, largely avoiding repeating dirtying within lazy_scan_prune, with dynamic feedback. You just need to put off the work of index/heap vacuuming to be able to do that kind of thing. Postgres 14 split the WAL record previously shared by both pruning and vacuuming (called XLOG_HEAP2_CLEAN) into two separate WAL records (called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). That made it easier to spot the fact that we usually have far fewer of the latter WAL records during VACUUM by using pg_waldump. Might be worth doing your own experiments on this. Other instrumentation changes in 14 also helped here. In particular the "%u pages from table (%.2f%% of total) had %lld dead item identifiers removed" line that was added to autovacuum's log output made it easy to spot how little heap vacuuming might really be needed. With some tables it is roughly the opposite way around (as much or even more heap vacuuming than pruning/freezing) -- you'll tend to see that in tables where opportunistic pruning leaves behind a lot of LP_DEAD stubs that only VACUUM can make LP_UNUSED. But, these same LP_DEAD-heavy tables *also* have a very decent chance of benefiting from a better index vacuuming strategy, something *also* enabled by the conveyor belt design. So overall, in either scenario, VACUUM concentrates on problems that are particular to a given table and workload, without being hindered by implementation-level restrictions. -- Peter Geoghegan