On Tue, Dec 7, 2021 at 3:44 PM Peter Geoghegan <p...@bowt.ie> wrote: > Fair enough, but even then we still ultimately have to generate a > final number that represents how close we are to a configurable "do an > autovacuum" threshold (such as an autovacuum_vacuum_scale_factor-based > threshold) -- the autovacuum.c side of this (the consumer side) > fundamentally needs the model to reduce everything to a one > dimensional number (even though the reality is that there isn't just > one dimension). This single number (abstract bloat units, abstract > dead tuples, whatever) is a function of things like the count of dead > HOT chains, perhaps the concentration of dead tuples on heap pages, > whatever -- but it's not the same thing as any one of those things we > count. > > I think that this final number needs to be denominated in abstract > units -- we need to call those abstract units *something*. I don't > care what that name ends up being, as long as it reflects reality.
If we're only trying to decide whether or not to vacuum a table, we don't need units: the output is a Boolean. If we're trying to decide on an order in which to vacuum tables, then we need units. But such units can't be anything related to dead tuples, because vacuum can be needed based on XID age, or MXID age, or dead tuples. The units would have to be something like abstract vacuum-urgency units (if higher is more urgent) or abstract remaining-headroom-beform-catastrophe units (if lower is more urgent). Ignoring wraparound considerations for a moment, I think that we might want to consider having multiple thresholds and vacuuming the table if any one of them are met. For example, suppose a table qualifies for vacuum when %-of-not-all-visible-pages > some-threshold, or alternatively when %-of-index-tuples-thought-to-be-dead > some-other-threshold. -- Robert Haas EDB: http://www.enterprisedb.com