Re: pgsql: Improve autovacuum logging for aggressive and anti-wraparound ru

Michael Paquier Mon, 23 Mar 2020 21:41:47 -0700

On Mon, Mar 23, 2020 at 06:41:50PM -0700, Andres Freund wrote:
> Which valid scenario can lead to this? Neither the comment, nor commit
> message explain it.


The commit message mentions that concurrent autovacuum jobs can lead
to the creation of non-aggressive and anti-wraparound jobs, which have
no sense because an aggressive and anti-wraparound job was already
done in parallel with a different worker, and that this was possible
because of inconsistent relcache lookups across concurrent jobs.  This
was mentioned upthread.

> Unless you're thinking of scenarios where autovacuum
> and manual vacuum are mixed, I don't really see valid reasons? Normally
> autovacuum's locking + the table_recheck_autovac() check should prevent
> problematic scenarios.
>
> I do see a few scenarios that can trigger this - but they all more or
> less are bugs.

Hmm.  OK.

> It doesn't strike me as a good idea to work around such bugs by silently
> neutering heap_vacuum_rel(). The likelihood of that temporarily covering
> up more severe problems seems significant - they're likely to then later
> bite you with a cluster shutdown.

Saying that, I have been thinking about this one for a couple of days
now and it seems to me that this is a factor contributing to what we
are seeing in [1], and I agree that this is just an incorrect approach
that makes easier to trigger the real underlying issues, while
table_recheck_autovac() ought to be the only code path doing the skip
job.  Note that I have failed to reproduce the behavior of the other
thread though, always finishing with a non-aggressive anti-wraparound
skipped because of an aggressive and anti-wraparound job happened just
before in parallel, and autovacuum was always able to continue
triggering new jobs, keeping the relfrozenxid age at bay.

So I would like to first revert that part, to have a cleaner state to
work on the underlying issues.  A pure revert means also adding back
the log message for non-aggressive and anti-wraparound jobs that
should never exist, which should be replaced by an assertion once all
the holes are fixed.  What do you think?

[1]: 
https://www.postgresql.org/message-id/cae39h23rtx1jkyjwc5tccv34hwwraizacuxomdqdpm+zt5-...@mail.gmail.com
--
Michael

signature.asc
Description: PGP signature

Re: pgsql: Improve autovacuum logging for aggressive and anti-wraparound ru

Reply via email to