On Tue, Jan 17, 2023 at 5:56 PM Peter Geoghegan <p...@bowt.ie> wrote: > > Why do you think that? > > For the reasons I gave about statistics, random sampling, the central > limit theorem. All that stuff. This matches the experience of Andres. > And is obviously the only explanation behind the reliance on > antiwraparound autovacuums for cleaning up bloat in larger OLTP > databases. It just fits: the dead tuples approach can sometimes be so > completely wrong that even an alternative triggering condition based > on something that is virtually unrelated to the thing we actually care > about can do much better in practice. Consistently, reliably, for a > given table/workload.
Hmm, I don't know. I have no intuition one way or the other for whether we're undercounting dead tuples, and I don't understand what would cause us to do that. I thought that we tracked that accurately, as part of the statistics system, not by sampling (pg_stat_all_tables.n_dead_tup). But, I think there are a number of other explanations for why we tend to rely on antiwraparound vacuums more than we should. Auto-cancellation. Skipping tables that are locked, or pages that are pinned. A cost limit that is too low relative to the size of the database, so that eventually all tables are in wraparound danger all the time. The fact that we can vacuum tables uselessly, without accomplishing anything, because the XID horizon is too new, but we don't know that so we just try to vacuum anyway. And then we repeat that useless work in an infinite loop. The fact that the system's idea of when a vacuum needs to happen grows with autovacuum_vacuum_scale_factor, but that actually gets too big too fast, so that eventually it never triggers vacuuming at all, or at least not before XID age does. I think we ought to fire autovacuum_vacuum_scale_factor out of an airlock. It's not the right model, and I think many people have been aware that it's not the right model for a decade, and we haven't been creative enough to come up with anything better. We *know* that you have to lower this value for large tables or they just don't get vacuumed often enough. That means we have some idea how often they ought to be vacuumed. I'm sure I'm not the person who has the best intuition on that point, but I bet people who have been responsible for large production systems have some decent ideas in that area. We should find out what that intuition is and come up with a new formula that matches the intuition of people with experience doing this sort of thing. e.g. 1. When computing autovacuum_vacuum_threshold + table_size * autovacuum_vacuum_scale_factor, if the result exceeds the value of a new parameter autovacuum_vacuum_maximum_threshold, then clamp the result to that value. 2. When computing autovacuum_vacuum_threshold + table_size * autovacuum_vacuum_scale_factor, if the result exceeds 80% of the number of dead TIDs we could store, clamp it to that number. 3. Change the formula altogether to use a square root or a cube root or a logarithm somewhere. I think we also ought to invent some sort of better cost limit system that doesn't shoot you in the foot automatically as the database grows. Nobody actually wants to limit the rate at which the database vacuums stuff to a constant. What they really want to do is limit it to a rate that is somewhat faster than the minimum rate needed to avoid disaster. We should try to develop metrics for whether vacuum is keeping up. I think one good one would be duty cycle -- if we have N vacuum workers, then over a period of K seconds we could have done as much as N*K process-seconds of vacuum work, and as little as 0. So figure out how many seconds of vacuum work we actually did, and divide that by N*K to get a percentage. If it's over, say, 90%, then we are not keeping up. We should dynamically raise the cost limit until we do. And drop it back down later when things look better. I don't actually see any reason why dead tuples, even counted in a relatively stupid way, isn't fundamentally good enough to get all tables vacuumed before we hit the XID age cutoff. It doesn't actually do that right now, but I feel like that must be because we're doing other stupid things, not because there's anything that terrible about the metric as such. Maybe that's wrong, but I find it hard to imagine. If I imagine a world where vacuum always gets started when the number of dead tuples hits some reasonable bound (rather than the unreasonable bound that the scale factor stuff computes) and it always cleans up those dead tuples (instead of doing a lot of work to clean up nothing at all, or doing a lot of work to clean up only a small fraction of those dead tuples, or cancelling itself, or skipping the table that has the problem because it's locked, or running with an unreasonably low cost limit, or otherwise being unable to GET THE JOB DONE) then how do we ever reach autovacuum_freeze_max_age? I think it would still be possible, but only if the XID consumption rate of the server is so high that we chunk through 300 million XIDs in the time it takes to perform an un-throttled vacuum of the table. I think that's a real threat and will probably be a bigger one in ten years, but it's only one of many things that are going wrong right now. > Even if we were omniscient, we still wouldn't be omnipotent. A sound theological point! > We're > still subject to the laws of physics. VACUUM would still be something > that more or less works at the level of the whole table, or not at > all. So being omniscient seems kinda overrated to me. Adding more > information does not in general lead to better outcomes. Yeah, I think that's true. In particular, it's not much use being omniscient but stupid. It would be better to have limited information and be smart about what you did with it. > What I'm arguing is that principally relying on any one thing is > garbage. If you have only one thing that creates pressure to VACUUM > then there can be a big impact whenever it turns out to be completely > wrong. Whereas if VACUUM can run because of (say) 3 moderate signals > taken together, then it's much less likely that we'll be completely > wrong. In general my emphasis is on avoiding disaster in all its > forms. Vacuuming somewhat early more often is perhaps suboptimal, but > far from a disaster. It's the kind of thing that we can manage. True, although it can be overdone. An extra vacuum on a big table with some large indexes that end up getting scanned can be very expensive even if the table itself is almost entirely all-visible. We can't afford to make too many mistakes in the direction of vacuuming early in such cases. -- Robert Haas EDB: http://www.enterprisedb.com