On Wed, Jan 18, 2023 at 2:22 PM Andres Freund <and...@anarazel.de> wrote: > The problem with the change is here: > > /* > * Okay, we've covered the corner cases. The normal calculation is to > * convert the old measurement to a density (tuples per page), then > * estimate the number of tuples in the unscanned pages using that > figure, > * and finally add on the number of tuples in the scanned pages. > */ > old_density = old_rel_tuples / old_rel_pages; > unscanned_pages = (double) total_pages - (double) scanned_pages; > total_tuples = old_density * unscanned_pages + scanned_tuples; > return floor(total_tuples + 0.5);
My assumption has always been that vac_estimate_reltuples() is prone to issues like this because it just doesn't have access to very much information each time it runs. It can only see the delta between what VACUUM just saw, and what the last VACUUM (or possibly the last ANALYZE) saw according to pg_class. You're always going to find weaknesses in such a model if you go looking for them. You're always going to find a way to salami slice your way from good information to total nonsense, if you pick the right/wrong test case, which runs VACUUM in a way that allows whatever bias there may be to accumulate. It's sort of like the way floating point values can become very inaccurate through a process that allows many small inaccuracies to accumulate over time. Maybe you're right to be concerned to the degree that you're concerned -- I'm not sure. I'm just adding what I see as important context. -- Peter Geoghegan