Re: cost delay brainstorming

Andy Fan Fri, 21 Jun 2024 21:11:04 -0700


Hi, 
> Hi,
>
> On 2024-06-17 15:39:27 -0400, Robert Haas wrote:
>> As I mentioned in my talk at 2024.pgconf.dev, I think that the biggest
>> problem with autovacuum as it exists today is that the cost delay is
>> sometimes too low to keep up with the amount of vacuuming that needs
>> to be done.
>
> I agree it's a big problem, not sure it's *the* problem. But I'm happy to see
> it improved anyway, so it doesn't really matter.


In my past knowldege,  another big problem is the way we triggers an
autovacuum on a relation. With the current stategy, if we have lots of
writes between 9:00 AM ~ 5:00 PM,  it is more likely to triggers an
autovauum at that time which is the peak time of application as well.

If we can trigger vacuum at off-peak time, like 00:00 am ~ 05:00 am,
even we use lots of resource, it is unlikly cause any issue.

> One issue around all of this is that we pretty much don't have the tools to
> analyze autovacuum behaviour across a larger number of systems in a realistic
> way :/.  I find my own view of what precisely the problem is being heavily
> swayed by the last few problematic cases I've looked t.
>
>
>> I think we might able to get fairly far by observing that if the
>> number of running autovacuum workers is equal to the maximum allowable
>> number of running autovacuum workers, that may be a sign of trouble,
>> and the longer that situation persists, the more likely it is that
>> we're in trouble. So, a very simple algorithm would be: If the maximum
>> number of workers have been running continuously for more than, say,
>> 10 minutes, assume we're falling behind and exempt all workers from
>> the cost limit for as long as the situation persists. One could
>> criticize this approach on the grounds that it causes a very sudden
>> behavior change instead of, say, allowing the rate of vacuuming to
>> gradually increase. I'm curious to know whether other people think
>> that would be a problem.
>
> Another issue is that it's easy to fall behind due to cost limits on systems
> where autovacuum_max_workers is smaller than the number of busy tables.
>
> IME one common situation is to have a single table that's being vacuumed too
> slowly due to cost limits, with everything else keeping up easily.
>
>
>> I think it might be OK, for a couple of reasons:
>>
>> 1. I'm unconvinced that the vacuum_cost_delay system actually prevents
>> very many problems. I've fixed a lot of problems by telling users to
>> raise the cost limit, but virtually never by lowering it. When we
>> lowered the delay by an order of magnitude a few releases ago -
>> equivalent to increasing the cost limit by an order of magnitude - I
>> didn't personally hear any complaints about that causing problems. So
>> disabling the delay completely some of the time might just be fine.
>
> I have seen disabling cost limits cause replication setups to fall over
> because the amount of WAL increases beyond what can be
> replicated/archived/replayed.  It's very easy to reach the issue when syncrep
> is enabled.

Usually applications have off-peak time, if we can use such character, we
might have some good result. But I know it is hard to do in PostgreSQL
core, I ever tried it in an external system (external minotor +
crontab-like). I can see the CPU / Memory ussage of autovacuum reduced a
lot at the daytime (application peak time).


>> 1a. Incidentally, when I have seen problems because of vacuum running
>> "too fast", it's not been because it was using up too much I/O
>> bandwidth, but because it's pushed too much data out of cache too
>> quickly. A long overnight vacuum can evict a lot of pages from the
>> system page cache by morning - the ring buffer only protects our
>> shared_buffers, not the OS cache. I don't think this can be fixed by
>> rate-limiting vacuum, though: to keep the cache eviction at a level
>> low enough that you could be certain of not causing trouble, you'd
>> have to limit it to an extremely low rate which would just cause
>> vacuuming not to keep up. The cure would be worse than the disease at
>> that point.
>
> I've seen versions of this too. Ironically it's often made way worse by
> ringbuffers, because even if there is space is shared buffers, we'll not move
> buffers there, instead putting a lot of pressure on the OS page cache.

I can understand the pressure on the OS page cache, but I thought the
OS page cache can be reused easily for any other purposes. Not sure what
outstanding issue it can cause.  

> - Longrunning transaction prevents increasing relfrozenxid, we run autovacuum
>   over and over on the same relation, using up the whole cost budget. This is
>   particularly bad because often we'll not autovacuum anything else, building
>   up a larger and larger backlog of actual work.

Could we maintain a pg_class.last_autovacuum_min_xid during vacuum? So
if we compare the OldestXminXid with pg_class.last_autovacuum_min_xid
before doing the real work. I think we can use a in-place update on it
to avoid too many versions of pg_class tuples when updating
pg_class.last_autovacuum_min_xid.

>
> - Tables, where on-access pruning works very well, end up being vacuumed far
>   too frequently, because our autovacuum scheduling doesn't know about tuples
>   having been cleaned up by on-access pruning.

Good to know this case. if we update the pg_stats_xx metrics when on-access
pruning, would it is helpful on this? 

> - Larger tables with occasional lock conflicts cause autovacuum to be
>   cancelled and restarting from scratch over and over. If that happens before
>   the second table scan, this can easily eat up the whole cost budget without
>   making forward progress.

Off-peak time + manual vacuum should be helpful I think.

-- 
Best Regards
Andy Fan

Re: cost delay brainstorming

Reply via email to