On Wed, May 1, 2024 at 2:19 PM Imseih (AWS), Sami <sims...@amazon.com> wrote: > > Unless I'm missing something major, that's completely bonkers. It > > might be true that it would be a good idea to vacuum such a table more > > often than we do at present, but there's no shot that we want to do it > > that much more often. > > This is really an important point. > > Too small of a threshold and a/v will constantly be vacuuming a fairly large > and busy table with many indexes. > > If the threshold is large, say 100 or 200 million, I question if you want > autovacuum > to be doing the work of cleanup here? That long of a period without a > autovacuum > on a table means there maybe something misconfigured in your autovacuum > settings. > > At that point aren't you just better off performing a manual vacuum and > taking advantage of parallel index scans?
As far as that last point goes, it would be good if we taught autovacuum about several things it doesn't currently know about; parallelism is one. IMHO, it's probably not the most important one, but it's certainly on the list. I think, though, that we should confine ourselves on this thread to talking about what the threshold ought to be. And as far as that goes, I'd like you - and others - to spell out more precisely why you think 100 or 200 million tuples is too much. It might be, or maybe it is in some cases but not in others. To me, that's not a terribly large amount of data. Unless your tuples are very wide, it's a few tens of gigabytes. That is big enough that I can believe that you *might* want autovacuum to run when you hit that threshold, but it's definitely not *obvious* to me that you want autovacuum to run when you hit that threshold. To make that concrete: If the table is 10TB, do you want to vacuum to reclaim 20GB of bloat? You might be vacuuming 5TB of indexes to reclaim 20GB of heap space - is that the right thing to do? If yes, why? I do think it's interesting that other people seem to think we should be vacuuming more often on tables that are substantially smaller than the ones that seem like a big problem to me. I'm happy to admit that my knowledge of this topic is not comprehensive and I'd like to learn from the experience of others. But I think it's clearly and obviously unworkable to multiply the current frequency of vacuuming for large tables by a three or four digit number. Possibly what we need here is something other than a cap, where, say, we vacuum a 10GB table twice as often as now, a 100GB table four times as often, and a 1TB table eight times as often. Or whatever the right answer is. But we can't just pull numbers out of the air like that: we need to be able to justify our choices. I think we all agree that big tables need to be vacuumed more often than the current formula does, but we seem to be rather far apart on the values of "big" and "more". -- Robert Haas EDB: http://www.enterprisedb.com