Hi, > Hi, > > On 2024-06-17 15:39:27 -0400, Robert Haas wrote: >> As I mentioned in my talk at 2024.pgconf.dev, I think that the biggest >> problem with autovacuum as it exists today is that the cost delay is >> sometimes too low to keep up with the amount of vacuuming that needs >> to be done. > > I agree it's a big problem, not sure it's *the* problem. But I'm happy to see > it improved anyway, so it doesn't really matter.
In my past knowldege, another big problem is the way we triggers an autovacuum on a relation. With the current stategy, if we have lots of writes between 9:00 AM ~ 5:00 PM, it is more likely to triggers an autovauum at that time which is the peak time of application as well. If we can trigger vacuum at off-peak time, like 00:00 am ~ 05:00 am, even we use lots of resource, it is unlikly cause any issue. > One issue around all of this is that we pretty much don't have the tools to > analyze autovacuum behaviour across a larger number of systems in a realistic > way :/. I find my own view of what precisely the problem is being heavily > swayed by the last few problematic cases I've looked t. > > >> I think we might able to get fairly far by observing that if the >> number of running autovacuum workers is equal to the maximum allowable >> number of running autovacuum workers, that may be a sign of trouble, >> and the longer that situation persists, the more likely it is that >> we're in trouble. So, a very simple algorithm would be: If the maximum >> number of workers have been running continuously for more than, say, >> 10 minutes, assume we're falling behind and exempt all workers from >> the cost limit for as long as the situation persists. One could >> criticize this approach on the grounds that it causes a very sudden >> behavior change instead of, say, allowing the rate of vacuuming to >> gradually increase. I'm curious to know whether other people think >> that would be a problem. > > Another issue is that it's easy to fall behind due to cost limits on systems > where autovacuum_max_workers is smaller than the number of busy tables. > > IME one common situation is to have a single table that's being vacuumed too > slowly due to cost limits, with everything else keeping up easily. > > >> I think it might be OK, for a couple of reasons: >> >> 1. I'm unconvinced that the vacuum_cost_delay system actually prevents >> very many problems. I've fixed a lot of problems by telling users to >> raise the cost limit, but virtually never by lowering it. When we >> lowered the delay by an order of magnitude a few releases ago - >> equivalent to increasing the cost limit by an order of magnitude - I >> didn't personally hear any complaints about that causing problems. So >> disabling the delay completely some of the time might just be fine. > > I have seen disabling cost limits cause replication setups to fall over > because the amount of WAL increases beyond what can be > replicated/archived/replayed. It's very easy to reach the issue when syncrep > is enabled. Usually applications have off-peak time, if we can use such character, we might have some good result. But I know it is hard to do in PostgreSQL core, I ever tried it in an external system (external minotor + crontab-like). I can see the CPU / Memory ussage of autovacuum reduced a lot at the daytime (application peak time). >> 1a. Incidentally, when I have seen problems because of vacuum running >> "too fast", it's not been because it was using up too much I/O >> bandwidth, but because it's pushed too much data out of cache too >> quickly. A long overnight vacuum can evict a lot of pages from the >> system page cache by morning - the ring buffer only protects our >> shared_buffers, not the OS cache. I don't think this can be fixed by >> rate-limiting vacuum, though: to keep the cache eviction at a level >> low enough that you could be certain of not causing trouble, you'd >> have to limit it to an extremely low rate which would just cause >> vacuuming not to keep up. The cure would be worse than the disease at >> that point. > > I've seen versions of this too. Ironically it's often made way worse by > ringbuffers, because even if there is space is shared buffers, we'll not move > buffers there, instead putting a lot of pressure on the OS page cache. I can understand the pressure on the OS page cache, but I thought the OS page cache can be reused easily for any other purposes. Not sure what outstanding issue it can cause. > - Longrunning transaction prevents increasing relfrozenxid, we run autovacuum > over and over on the same relation, using up the whole cost budget. This is > particularly bad because often we'll not autovacuum anything else, building > up a larger and larger backlog of actual work. Could we maintain a pg_class.last_autovacuum_min_xid during vacuum? So if we compare the OldestXminXid with pg_class.last_autovacuum_min_xid before doing the real work. I think we can use a in-place update on it to avoid too many versions of pg_class tuples when updating pg_class.last_autovacuum_min_xid. > > - Tables, where on-access pruning works very well, end up being vacuumed far > too frequently, because our autovacuum scheduling doesn't know about tuples > having been cleaned up by on-access pruning. Good to know this case. if we update the pg_stats_xx metrics when on-access pruning, would it is helpful on this? > - Larger tables with occasional lock conflicts cause autovacuum to be > cancelled and restarting from scratch over and over. If that happens before > the second table scan, this can easily eat up the whole cost budget without > making forward progress. Off-peak time + manual vacuum should be helpful I think. -- Best Regards Andy Fan