On Mon, Mar 23, 2020 at 06:41:50PM -0700, Andres Freund wrote: > Which valid scenario can lead to this? Neither the comment, nor commit > message explain it.
The commit message mentions that concurrent autovacuum jobs can lead to the creation of non-aggressive and anti-wraparound jobs, which have no sense because an aggressive and anti-wraparound job was already done in parallel with a different worker, and that this was possible because of inconsistent relcache lookups across concurrent jobs. This was mentioned upthread. > Unless you're thinking of scenarios where autovacuum > and manual vacuum are mixed, I don't really see valid reasons? Normally > autovacuum's locking + the table_recheck_autovac() check should prevent > problematic scenarios. > > I do see a few scenarios that can trigger this - but they all more or > less are bugs. Hmm. OK. > It doesn't strike me as a good idea to work around such bugs by silently > neutering heap_vacuum_rel(). The likelihood of that temporarily covering > up more severe problems seems significant - they're likely to then later > bite you with a cluster shutdown. Saying that, I have been thinking about this one for a couple of days now and it seems to me that this is a factor contributing to what we are seeing in [1], and I agree that this is just an incorrect approach that makes easier to trigger the real underlying issues, while table_recheck_autovac() ought to be the only code path doing the skip job. Note that I have failed to reproduce the behavior of the other thread though, always finishing with a non-aggressive anti-wraparound skipped because of an aggressive and anti-wraparound job happened just before in parallel, and autovacuum was always able to continue triggering new jobs, keeping the relfrozenxid age at bay. So I would like to first revert that part, to have a cleaner state to work on the underlying issues. A pure revert means also adding back the log message for non-aggressive and anti-wraparound jobs that should never exist, which should be replaced by an assertion once all the holes are fixed. What do you think? [1]: https://www.postgresql.org/message-id/cae39h23rtx1jkyjwc5tccv34hwwraizacuxomdqdpm+zt5-...@mail.gmail.com -- Michael
signature.asc
Description: PGP signature