Robert Haas has written on the subject of useless vacuuming, here: http://rhaas.blogspot.com/2020/02/useless-vacuuming.html
I'm sure at least a few of us have thought about the problem at some point. I would like to discuss how we can actually avoid useless vacuuming, and what our goals should be. I am currently working on decoupling advancing relfrozenxid from tuple freezing [1]. That is, I'm teaching VACUUM to keep track of information that it uses to generate an "optimal value" for the table's final relfrozenxid: the most recent XID value that might still be in the table. This patch is based on the observation that we don't actually have to use the FreezeLimit cutoff for our new pg_class.relfrozenxid. We need only obey the basic relfrozenxid invariant, which is that the final value must be <= any extant XID in the table. Using FreezeLimit is needlessly conservative. My draft patch to implement the optimization (which builds on the patches already posted to [1]) will reliably set pg_class.relfrozenxid to the same VACUUM's precise original OldestXmin once certain conditions are met -- reasonably common conditions. For example, the same precise OldestXmin XID is used for relfrozenxid in the event of a manual VACUUM (without FREEZE) on a table that was just bulk-loaded, assuming the system is otherwise idle. Setting relfrozenxid to the precise lowest safe value happens on a best-effort basis, without needlessly tying that to things like when or how we freeze tuples. It now occurs to me to push this patch in another direction, on top of all that: the OldestXmin behavior hints at a precise, robust way of defining "useless vacuuming". We can condition skipping a VACUUM (i.e. whether a VACUUM is considered "definitely won't be useful if allowed to execute") on whether or not our preexisting pg_class.relfrozenxid precisely equals our newly-acquired OldestXmin for an about-to-begin VACUUM operation. (We'd also want to add an "unchangeable pg_class.relminmxid" test, I think.) This definition does seem to be close to ideal: We're virtually assured that there will be no more useful work for us, in a way that is grounded in theory but still quite practical. But it's not a slam dunk. A person could still argue that we shouldn't cancel the VACUUM before it has begun, even when all these conditions have been met. This would not be a particularly strong argument, mind you, but it's still worth taking seriously. We need an exact problem statement that justifies whatever definition of "useless VACUUM" we settle on. Here are arguments *against* the skipping behavior I sketched out: * An aborted transaction might need to be cleaned up, which should be able to go ahead despite the unchanged OldestXmin. (I think that this is the argument with the most merit, by quite a bit.) * In general index AMs may want to do deferred cleanup, say to place previously deleted pages in the FSM. Although in practice the criteria for recycling safety used by nbtree and GiST will make that impossible, there is no fundamental reason why they need to work that way (XIDs are used, but only because they provide a conveniently available notion of "logical time" that is sufficient to implement what Lanin & Shasha call "the drain technique"). Plus GIN really could do real work in amvacuumcleanup, for the pending list. There are bound to be a handful of marginal things like this. * Who are we to intervene like this, anyway? (Makes much more sense if we don't limit ourselves to autovacuum worker operations.) Offhand, I suspect that we should only consider skipping "useless" anti-wraparound autovacuums (not other kinds of autovacuums, not manual VACUUMs). The arguments against skipping are weakest for the anti-wraparound case. And the arguments in favor are particularly strong: we should specifically avoid starting a useless (and possibly time-consuming) anti-wraparound autovacuum, because that could easily block an actually-useful autovacuum launched some time later. We should aim to be in a position to launch an anti-wraparound autovacuum that can actually advance relfrozenxid as soon as that becomes possible (e.g. when the DBA drops an old replication slot that was holding back each VACUUM's OldestXmin). And so "skipping" makes us much more responsive, which seems like it might matter a lot in practice. It minimizes the risk of wraparound failure. There is also a strong argument for logging our failure to clean up anything in any autovacuum -- we don't do nearly enough alerting when stuff like this happens (possibly because "useless" is such a squishy concept right now?). Just logging something still requires defining "useless VACUUM operation" in a way that is both reliable and proportionate. So just logging something necessitates solving that hard problem. [1] https://commitfest.postgresql.org/36/3433/ -- Peter Geoghegan