Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

Peter Geoghegan Tue, 24 Jan 2023 12:33:43 -0800

On Tue, Jan 24, 2023 at 11:21 AM Robert Haas <[email protected]> wrote:
> > The whole article was about how this DROP TRIGGER pattern worked just
> > fine most of the time, because most of the time autovacuum was just
> > autocancelled. They say this at one point:
> >
> > "The normal autovacuum mechanism is skipped when locks are held in
> > order to minimize service disruption. However, because transaction
> > wraparound is such a severe problem, if the system gets too close to
> > wraparound, an autovacuum is launched that does not back off under
> > lock contention."
>
> If this isn't arguing in favor of exactly what I'm saying, I don't
> know what that would look like.


I'm happy to clear that up. What you said was:

"So I think this sounds like exactly the kind of case I was talking about, where
autovacuums keep getting cancelled until we decide to stop cancelling them.
If so, then they were going to have a problem whenever that happened."

Just because *some* autovacuums get cancelled doesn't mean they *all*
get cancelled. And, even if the rate is quite high, that may not be
much of a problem in itself (especially now that we have the freeze
map). 200 million XIDs usually amounts to a lot of wall clock time.
Even if it is rather difficult to finish up, we only have to get lucky
once.

The fact that autovacuum eventually got to the point of requiring an
antiwraparound autovacuum on the problematic table does indeed
strongly suggest that any other, earlier autovacuums were relatively
unlikely to have advanced relfrozenxid in the end -- or at least
couldn't on this one occasion. But that in itself is just not relevant
to our current discussion, since even the tiniest perturbation would
have been enough to prevent a non-aggressive VACUUM from being able to
advance relfrozenxid. Before 15, non-aggressive VACUUMs would throw
away the opportunity to do so just because they couldn't immediately
get a cleanup lock on one single heap page.

It's quite possible that most or all prior aggressive VACUUMs were not
antiwraparound autovacuums, because the dead tuples accounting was
enough to launch an autovacuum at some point after age(relfrozenxid)
exceeded vacuum_freeze_table_age that was still before it could reach
autovacuum_freeze_max_age. That would give you a cancellable
aggressive VACUUM -- a VACUUM that actually has a non-zero chance of
advancing relfrozenxid.

Sure, it's possible that such a cancellable aggressive autovacuum was
indeed cancelled, and that that factor made the crucial difference.
But I find it far easier to believe that there simply was no such
aggressive autovacuum in the first place (not this time), since it
could have only happened when autovacuum thinks that there are
sufficiently many dead tuples to justify launching an autovacuum in
the first place. Which, as we now all accept, is based on highly
dubious sampling by ANALYZE. So I think it's much more likely to be
that factor (dead tuple accounting is bad), as well as the absurd
false dichotomy between aggressive and non-aggressive -- plus the
issue at hand, the auto-cancellation behavior.

I don't claim to know what is inevitable, or what is guaranteed to
work or not work. I only claim that we can meaningfully reduce the
absolute risk by using a fairly simple approach, principally by not
needlessly coupling the auto-cancellation behavior to *all*
autovacuums that are specifically triggered by age(relfrozenxid). As
Andres said at one point, doing those two things at exactly the same
time is just arbitrary.

--
Peter Geoghegan

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

Reply via email to