On Sun, Aug 28, 2022 at 4:14 PM Nathan Bossart <nathandboss...@gmail.com> wrote: > The idea seems sound to me, and IMO your patch simplifies things nicely, > which might be reason enough to proceed with it.
It is primarily a case of making things simpler. Why would it ever make sense to interpret age differently in the presence of a long running transaction, though only for the FreezeLimit/MultiXactCutoff cutoff calculation? And not for the closely related freeze_table_age/multixact_freeze_table_age calculation? It's hard to imagine that that was ever a deliberate choice. vacuum_set_xid_limits() didn't contain the logic for determining if its caller's VACUUM should be an aggressive VACUUM until quite recently. Postgres 15 commit efa4a9462a put the logic for determining aggressiveness right next to the logic for determining FreezeLimit, which made the inconsistency much more noticeable. It is easy to believe that this was really just an oversight, all along. > However, I'm struggling > to understand when this change would help much in practice. IIUC it will > cause vacuums to freeze a bit more, but outside of extreme cases (maybe > when vacuum_freeze_min_age is set very high and there are long-running > transactions), ISTM it might not have tremendously much impact. Is the > intent to create some sort of long-term behavior change for autovacuum, or > is this mostly aimed towards consistency among the cutoff calculations? I agree that this will have only a negligible impact on the majority (perhaps even the vast majority) of applications. The primary justification for this patch (simplification) seems sufficient, all things considered. Even still, it's possible that it will help in extreme cases. Cases with pathological performance issues, particularly those involving MultiXacts. We already set FreezeLimit to the most aggressive possible value of OldestXmin when OldestXmin has itself already crossed a quasi arbitrary XID-age threshold of autovacuum_freeze_max_age XIDs (i.e. when OldestXmin < safeLimit), with analogous rules for MultiXactCutoff/OldestMxact. Consequently, the way that we set the cutoffs for freezing in pathological cases where (say) there is a leaked replication slot will see a sharp discontinuity in how FreezeLimit is set, within and across tables. And for what? Initially, these pathological cases will end up using exactly the same FreezeLimit for every VACUUM against every table (assuming that we're using a system-wide min_freeze_age setting) -- every VACUUM operation will use a FreezeLimit of `OldestXmin - vacuum_freeze_min_age`. At a certain point that'll suddenly flip -- now every VACUUM operation will use a FreezeLimit of `OldestXmin`. OTOH with the patch they'd all have a FreezeLimit that is tied to the time that each VACUUM started -- which is exactly the FreezeLimit behavior that we'd get if there was no leaked replication slot (at least until OldestXmin finally attains an age of vacuum_freeze_min_age, when it finally becomes unavoidable, even with the patch). There is something to be said for preserving the "natural diversity" of the relfrozenxid values among tables, too. The FreezeLimit we use is (at least for now) almost always going to be very close to (if not exactly) the same value as the NewFrozenXid value that we set relfrozenxid to at the end of VACUUM (at least in larger tables). Without the patch, a once-off problem with a leaked replication slot can accidentally result in lasting problems where all of the largest tables get their antiwraparound autovacuums at exactly the same time. The current behavior increases the risk that we'll accidentally "synchronize" the relfrozenxid values for large tables that had an antiwraparound vacuum during the time when OldestXmin was held back. Needlessly using the same FreezeLimit across each VACUUM operation risks disrupting the natural ebb and flow of things. It's hard to say how much of a problem that is in the real word. But why take any chances? -- Peter Geoghegan