On Tue, Mar 16, 2021 at 6:40 AM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > Note that I've merged multiple existing functions in vacuumlazy.c into > > one: the patch merges lazy_vacuum_all_indexes() and lazy_vacuum_heap() > > into a single function named vacuum_indexes_mark_unused()
> I agree to create a function like vacuum_indexes_mark_unused() that > makes a decision and does index and heap vacumming accordingly. But > what is the point of removing both lazy_vacuum_all_indexes() and > lazy_vacuum_heap()? I think we can simply have > vacuum_indexes_mark_unused() call those functions. I'm concerned that > if we add additional criteria to vacuum_indexes_mark_unused() in the > future the function will become very large. I agree now. I became overly excited about advertising the fact that these two functions are logically one thing. This is important, but it isn't necessary to go as far as actually making everything into one function. Adding some comments would also make that point clear, but without making vacuumlazy.c even more spaghetti-like. I'll fix it. > > I wonder if we can add some kind of emergency anti-wraparound vacuum > > logic to what I have here, for Postgres 14. > +1 > > I think we can set VACOPT_TERNARY_DISABLED to > tab->at_params.index_cleanup in table_recheck_autovac() or increase > the thresholds used to not skipping index vacuuming. I was worried about the "tupgone = true" special case causing problems when we decide to do some index vacuuming and some heap vacuuming/LP_UNUSED-marking but then later decide to end the VACUUM. But I now think that getting rid of "tupgone = true" gives us total freedom to choose what to do, including the freedom to start with index vacuuming and then give up on it later -- even after we do some amount of LP_UNUSED-marking (during a VACUUM with multiple index passes, perhaps due to a low maintenance_work_mem setting). That isn't actually special at all, because everything will be 100% decoupled. Whether or not it's a good idea to either not start index vacuuming or to end it early (e.g. due to XID wraparound pressure) is a complicated question, and the right approach will be debatable in each case/when thinking about each issue. However, deciding on the best behavior to address these problems should have nothing to do with implementation details and everything to do with the costs and benefits to users. Which now seems possible. A sophisticated version of the "XID wraparound pressure" implementation could increase reliability without ever being conservative, just by evaluating the situation regularly and being prepared to change course when necessary -- until it is truly a matter of shutting down new XID allocations/the server. It should be possible to decide to end VACUUM early and advance relfrozenxid (provided we have reached the point of finishing all required pruning and freezing, of course). Highly agile behavior seems quite possible, even if it takes a while to agree on a good design. > Aside from whether it's good or bad, strictly speaking, it could > change the index AM API contract. The documentation of > amvacuumcleanup() says: > > --- > stats is whatever the last ambulkdelete call returned, or NULL if > ambulkdelete was not called because no tuples needed to be deleted. > --- > > With this change, we could pass NULL to amvacuumcleanup even though > the index might have tuples needed to be deleted. I think that we should add a "Note" box to the documentation that notes the difference here. Though FWIW my interpretation of the words "no tuples needed to be deleted" was always "no tuples needed to be deleted because vacuumlazy.c didn't call ambulkdelete()". After all, VACUUM can add to tups_vacuumed through pruning inside heap_prune_chain(). It is already possible (though only barely) to not call ambulkdelete() for indexes (to only call amvacuumcleanup() during cleanup) despite the fact that heap vacuuming did "delete tuples". It's not that important, but my point is that the design has always been top-down -- an index AM "needs to delete" whatever it is told it needs to delete. It has no direct understanding of any higher-level issues. > As I mentioned in a recent reply, I'm concerned about a case where we > ran out maintenance_work_mem and decided not to skip index vacuuming > but collected only a few dead tuples in the second index vacuuming > (i.g., the total amount of dead tuples is slightly larger than > maintenance_work_mem). In this case, I think we can skip the second > (i.g., final) index vacuuming at least in terms of > maintenance_work_mem. Maybe the same is true in terms of LP_DEAD > accumulation. I remember that. That now seems very doable, but time grows short... I have already prototyped Andres' idea, which was to eliminate the HEAPTUPLE_DEAD case inside lazy_scan_heap() by restarting pruning for the same page. I've also moved the pruning into its own function called lazy_scan_heap_page(), because handling the restart requires that we be careful about not incrementing things until we're sure we won't need to repeat pruning. This seems to work well, and the tests all pass. What I have right now is still too rough to post to the list, though. Even with a pg_usleep(10000) after the call to heap_page_prune() but before the second/local HeapTupleSatisfiesVacuum() call, we almost never actually hit the HEAPTUPLE_DEAD case. So the overhead must be absolutely negligible. Adding a "goto restart" to the HEAPTUPLE_DEAD case is ugly, but the "tupgone = true" thing is an abomination, so that seems okay. This approach definitely seems like the way forward, because it's obviously safe -- it may even be safer, because heap_prepare_freeze_tuple() kind of expects this behavior today. -- Peter Geoghegan