On Wed, Jan 18, 2023 at 6:10 PM Andres Freund <and...@anarazel.de> wrote: > > This creates an awkward but logical question, though: what if > > dead_tuples doesn't go down at all? What if VACUUM actually has to > > increase it, because VACUUM runs so slowly relative to the workload? > > Sure, that can happen - but it's not made better by having wrong stats :)
Maybe it's that simple. It's reasonable to wonder how far we want to go with letting dead tuples grow and grow, even when VACUUM is running constantly. It's not a question that has an immediate and obvious answer IMV. Maybe the real question is: is this an opportunity to signal to the user (say via a LOG message) that VACUUM cannot keep up? That might be very useful, in a general sort of way (not just to avoid new problems). > We have reasonably sophisticated accounting in pgstats what newly live/dead > rows a transaction "creates". So an obvious (and wrong) idea is just decrement > reltuples by the number of tuples removed by autovacuum. Did you mean dead_tuples? > But we can't do that, > because inserted/deleted tuples reported by backends can be removed by > on-access pruning and vacuumlazy doesn't know about all changes made by its > call to heap_page_prune(). I'm not following here. Perhaps this is a good sign that I should stop working for the day. :-) > But I think that if we add a > pgstat_count_heap_prune(nredirected, ndead, nunused) > around heap_page_prune() and a > pgstat_count_heap_vacuum(nunused) > in lazy_vacuum_heap_page(), we'd likely end up with a better approximation > than what vac_estimate_reltuples() does, in the "partially scanned" case. What does vac_estimate_reltuples() have to do with dead tuples? -- Peter Geoghegan