Re: New GUC autovacuum_max_threshold ?

Robert Haas Wed, 01 May 2024 11:51:27 -0700

On Wed, May 1, 2024 at 2:19 PM Imseih (AWS), Sami <sims...@amazon.com> wrote:
> > Unless I'm missing something major, that's completely bonkers. It
> > might be true that it would be a good idea to vacuum such a table more
> > often than we do at present, but there's no shot that we want to do it
> > that much more often.
>
> This is really an important point.
>
> Too small of a threshold and a/v will constantly be vacuuming a fairly large
> and busy table with many indexes.
>
> If the threshold is large, say 100 or 200 million, I question if you want 
> autovacuum
> to be doing the work of cleanup here?  That long of a period without a 
> autovacuum
> on a table means there maybe something  misconfigured in your autovacuum 
> settings.
>
> At that point aren't you just better off performing a manual vacuum and
> taking advantage of parallel index scans?


As far as that last point goes, it would be good if we taught
autovacuum about several things it doesn't currently know about;
parallelism is one. IMHO, it's probably not the most important one,
but it's certainly on the list. I think, though, that we should
confine ourselves on this thread to talking about what the threshold
ought to be.

And as far as that goes, I'd like you - and others - to spell out more
precisely why you think 100 or 200 million tuples is too much. It
might be, or maybe it is in some cases but not in others. To me,
that's not a terribly large amount of data. Unless your tuples are
very wide, it's a few tens of gigabytes. That is big enough that I can
believe that you *might* want autovacuum to run when you hit that
threshold, but it's definitely not *obvious* to me that you want
autovacuum to run when you hit that threshold.

To make that concrete: If the table is 10TB, do you want to vacuum to
reclaim 20GB of bloat? You might be vacuuming 5TB of indexes to
reclaim 20GB of heap space - is that the right thing to do? If yes,
why?

I do think it's interesting that other people seem to think we should
be vacuuming more often on tables that are substantially smaller than
the ones that seem like a big problem to me. I'm happy to admit that
my knowledge of this topic is not comprehensive and I'd like to learn
from the experience of others. But I think it's clearly and obviously
unworkable to multiply the current frequency of vacuuming for large
tables by a three or four digit number. Possibly what we need here is
something other than a cap, where, say, we vacuum a 10GB table twice
as often as now, a 100GB table four times as often, and a 1TB table
eight times as often. Or whatever the right answer is. But we can't
just pull numbers out of the air like that: we need to be able to
justify our choices. I think we all agree that big tables need to be
vacuumed more often than the current formula does, but we seem to be
rather far apart on the values of "big" and "more".

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: New GUC autovacuum_max_threshold ?

Reply via email to