On Wed, Mar 17, 2021 at 7:16 PM Masahiko Sawada <sawada.m...@gmail.com> wrote: > Since I was thinking that always skipping index vacuuming on > anti-wraparound autovacuum is legitimate, skipping index vacuuming > only when we're really close to the point of going into read-only mode > seems a bit conservative, but maybe a good start. I've attached a PoC > patch to disable index vacuuming if the table's relfrozenxid is too > older than autovacuum_freeze_max_age (older than 1.5x of > autovacuum_freeze_max_age).
Most anti-wraparound VACUUMs are really not emergencies, though. So treating them as special simply because they're anti-wraparound vacuums doesn't seem like the right thing to do. I think that we should dynamically decide to do this when (antiwraparound) VACUUM has already been running for some time. We need to delay the decision until it is almost certainly true that we really have an emergency. Can you take what you have here, and make the decision dynamic? Delay it until we're done with the first heap scan? This will require rebasing on top of the patch I posted. And then adding a third patch, a little like the second patch -- but not too much like it. In the second/SKIP_VACUUM_PAGES_RATIO patch I posted today, the function two_pass_strategy() (my new name for the main entry point for calling lazy_vacuum_all_indexes() and lazy_vacuum_heap()) is only willing to perform the "skip index vacuuming" optimization when the call to two_pass_strategy() is the first call and the last call for that entire VACUUM (plus we test the number of heap blocks with LP_DEAD items using SKIP_VACUUM_PAGES_RATIO, of course). It works this way purely because I don't think that we should be aggressive when we've already run out of maintenance_work_mem. That's a bad time to apply a performance optimization. But what you're talking about now isn't a performance optimization (the mechanism is similar or the same, but the underlying reasons are totally different) -- it's a safety/availability thing. I don't think that you need to be concerned about running out of maintenance_work_mem in two_pass_strategy() when applying logic that is concerned about keeping the database online by avoiding XID wraparound. You just need to have high confidence that it is a true emergency. I think that we can be ~99% sure that we're in a real emergency by using dynamic information about how old relfrozenxid is *now*, and by rechecking a few times during VACUUM. Probably by rechecking every time we call two_pass_strategy(). I now believe that there is no fundamental correctness issue with teaching two_pass_strategy() to skip index vacuuming when we're low on memory -- it is 100% a matter of costs and benefits. The core skip-index-vacuuming mechanism is very flexible. If we can be sure that it's a real emergency, I think that we can justify behaving very aggressively (letting indexes get bloated is after all very aggressive). We just need to be 99%+ sure that continuing with vacuuming will be worse that ending vacuuming. Which seems possible by making the decision dynamic (and revisiting it at least a few times during a very long VACUUM, in case things change). -- Peter Geoghegan