On Sun, Mar 21, 2021 at 1:24 AM Greg Stark <st...@mit.edu> wrote: > What I've seen is an application that regularly ran ANALYZE on a > table. This worked fine as long as vacuums took less than the interval > between analyzes (in this case 1h) but once vacuum started taking > longer than that interval autovacuum would cancel it every time due to > the conflicting lock. > > That would have just continued until the wraparound vacuum which > wouldn't self-cancel except that there was also a demon running which > would look for sessions stuck on a lock and kill the blocker -- which > included killing the wraparound vacuum.
That's a new one! Though clearly it's an example of what I described. I do agree that sometimes the primary cause is the special rules for cancellations with anti-wraparound autovacuums. > And yes, this demon is obviously a terrible idea but of course it was > meant for killing buggy user queries. It wasn't expecting to find > autovacuum jobs blocking things. The real surprise for that user was > that VACUUM could be blocked by things that someone would reasonably > want to run regularly like ANALYZE. The infrastructure from my patch to eliminate the tupgone special case (the patch that fully decouples index and heap vacuuming from pruning and freezing) ought to enable smarter autovacuum cancellations. It should be possible to make "canceling" an autovacuum worker actually instruct the worker to consider the possibility of finishing off the VACUUM operation very quickly, by simply ending index vacuuming (and heap vacuuming). It should only be necessary to cancel when that strategy won't work out, because we haven't finished all required pruning and freezing yet -- which are the only truly essential tasks of any "successful" VACUUM operation. Maybe it would only be appropriate to do something like that for anti-wraparound VACUUMs, which, as you say, don't get cancelled when they block the acquisition of a lock (which is a sensible design, though only because of the specific risk of not managing to advance relfrozenxid). There wouldn't be a question of canceling an anti-wraparound VACUUM in the conventional sense with this mechanism. It would simply instruct the anti-wraparound VACUUM to finish as quickly as possible by skipping the indexes. Naturally the implementation wouldn't really need to consider whether that meant the anti-wraparound VACUUM could end almost immediately, or some time later -- the point is that it completes ASAP. -- Peter Geoghegan