Re: New GUC autovacuum_max_threshold ?

Robert Haas Fri, 26 Apr 2024 06:27:49 -0700

On Thu, Apr 25, 2024 at 10:24 PM Laurenz Albe <laurenz.a...@cybertec.at> wrote:
> I don't find that convincing.  Why are 2TB of wasted space in a 10TB
> table worse than 2TB of wasted space in 100 tables of 100GB each?


It's not worse, but it's more avoidable. No matter what you do, any
table that suffers a reasonable number of updates and/or deletes is
going to have some wasted space. When a tuple is deleted or update,
the old one has to stick around until its xmax is all-visible, and
then after that until the page is HOT pruned which may not happen
immediately, and then even after that the line pointer sticks around
until the next vacuum which doesn't happen instantly either. No matter
how aggressive you make autovacuum, or even no matter how aggressively
you vacuum manually, non-insert-only tables are always going to end up
containing some bloat.

But how much? Well, it's basically given by
RATE_AT_WHICH_SPACE_IS_WASTED * AVERAGE_TIME_UNTIL_SPACE_IS_RECLAIMED.
Which, you'll note, does not really depend on the table size. It does
a little bit, because the time until a tuple is fully removed,
including the line pointer, depends on how long vacuum takes, and
vacuum takes larger on a big table than a small one. But the effect is
much less than linear, I believe, because you can HOT-prune as soon as
the xmax is all-visible, which reclaims most of the space instantly.
So in practice, the minimum feasible steady-state bloat for a table
depends a great deal on how fast updates and deletes are happening,
but only weakly on the size of the table.

Which, in plain English, means that you should be able to vacuum a
10TB table often enough that it doesn't accumulate 2TB of bloat, if
you want to. It's going to be harder to vacuum a 10GB table often
enough that it doesn't accumulate 2GB of bloat. And it's going to be
*really* hard to vacuum a 10MB table often enough that it doesn't
accumulate 2MB of bloat. The only way you're going to be able to do
that last one at all is if the update rate is very low.

> > Another reason, at least in existing releases, is that at some
> > point index vacuuming hits a wall because we run out of space for dead
> > tuples. We *most definitely* want to do index vacuuming before we get
> > to the point where we're going to have to do multiple cycles of index
> > vacuuming.
>
> That is more convincing.  But do we need a GUC for that?  What about
> making a table eligible for autovacuum as soon as the number of dead
> tuples reaches 90% of what you can hold in "autovacuum_work_mem"?

That would have been a good idea to do in existing releases, a long
time before now, but we didn't. However, the new dead TID store
changes the picture, because if I understand John Naylor's remarks
correctly, the new TID store can hold so many TIDs so efficiently that
you basically won't run out of memory. So now I think this wouldn't be
effective - yet I still think it's wrong to let the vacuum threshold
scale without bound as the table size increases.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: New GUC autovacuum_max_threshold ?

Reply via email to