On Tue, Oct 4, 2022 at 10:39 AM Jeff Davis <pg...@j-davis.com> wrote: > On Mon, 2022-10-03 at 22:45 -0700, Peter Geoghegan wrote: > > Once a table becomes larger than vacuum_freeze_strategy_threshold, > > VACUUM stops marking pages all-visible in the first place, > > consistently marking them all-frozen instead. > > What are the trade-offs here? Why does it depend on table size?
That's a great question. The table-level threshold vacuum_freeze_strategy_threshold more or less buckets every table into one of two categories: small tables and big tables. Perhaps this seems simplistic to you. That would be an understandable reaction, given the central importance of this threshold. The current default of 4GB could have easily been 8GB or perhaps even 16GB instead. It's not so much size as the rate of growth over time that matters. We really want to do eager freezing on "growth tables", particularly append-only tables. On the other hand we don't want to do useless freezing on small, frequently updated tables, like pgbench_tellers or pgbench_branches -- those tables may well require zero freezing, and yet each VACUUM will advance relfrozenxid to a very recent value consistently (even on Postgres 15). But "growth" is hard to capture, because in general we have to infer things about the future from the past, which is difficult and messy. Since it's hard to capture "growth table vs fixed size table" directly, we use table size as a proxy. It's far from perfect, but I think that it will work quite well in practice because most individual tables simply never get very large. It's very common for a relatively small number of tables to consistently grow, without bound (perhaps not strictly append-only tables, but tables where nothing is ever deleted and inserts keep happening). So a simplistic threshold (combined with dynamic per-page decisions about freezing) should be enough to avoid most of the downside of eager freezing. In particular, we will still freeze lazily in tables where it's obviously very unlikely to be worth it. In general I think that being correct on average is overrated. It's more important to always avoid being dramatically wrong -- especially if there is no way to course correct in the next VACUUM. Although I think that we have a decent chance of coming out ahead by every available metric, that isn't really the goal. Why should performance stability not have some cost, at least in some cases? I want to keep the cost as low as possible (often "negative cost" relative to Postgres 15), but overall I am consciously making a trade-off. There are downsides. -- Peter Geoghegan