Re: Eagerly scan all-visible pages to amortize aggressive vacuum

Robert Haas Fri, 24 Jan 2025 08:21:05 -0800

On Fri, Jan 24, 2025 at 9:15 AM Melanie Plageman
<[email protected]> wrote:
> So, in this case, there is only one table in question, so 1 autovacuum
> worker (and up to 2 maintenance parallel workers for index vacuuming).
> The duration I provided is just the absolute duration from start of
> vacuum to finish -- not considering the amount of time each parallel
> worker may have been working (also it includes time spent delaying).
> The benchmark ran for 2.8 hours. I configured vacuum to run
> frequently. In this case, master spent 47% of the total time vacuuming
> and the patch spent 56%.


Definitely not insignificant, but I think it's OK for a worst case.
Autovacuum is a background process, so it's not like a 20% regression
on query performance.

> Inherent in frontloading work is wasting it if it turns out the work
> is unneeded. Unneeded work is from one of two sources 1) we failed to
> freeze the page or 2) we succeed in freezing the page but then the
> page is unfrozen before the next aggressive vacuum. Avoiding 1 would
> require knowledge about the distribution of page ages throughout the
> relation that we decided was too expensive to get and store. Avoiding
> 2 would require prescience about the future of the workload. We found
> building and storing a model to make predictions like that too
> complicated, error-prone, and expensive.

Well, the algorithm has guards against doing too much of (1). I think
that's really important. One of the really bad things about the AV
algorithm in general is that it will happily keep retrying VACUUM on
tables where there's no chance of removing any more tuples because no
relevant transactions have ended since the last time we vacuumed. But
this patch stops trying to do the thing that it does if we see that it
isn't working out -- and the thresholds are pretty tight. We could
make them even tighter, but it's already the case, I think, that after
a pretty modest amount of not freezing things, we stop trying to
freeze things.

As far as (2) goes, I'm open to the idea that this can be further
improved in the future, but I believe it will be really hard to do
better than looking at the time since last modification. The model you
proposed needed a fairly large amount of new statistics, and it still
couldn't handle something as simple as "half of the pages are modified
after X amount of time, and the other and are modified after X+Y
amount of time". I think what we'd really want to be able to make good
predictions is to look at a Fourier transform of the
inter-modification times -- but that would require even more detailed
data than you were gathering, and that was already pushing the limits
of what was realistic.

In short, I feel like this algorithm is more vulnerable to (2) than
(1), but I agree that we can't do much better right now. It will be
interesting to see what users think of the results, assuming this does
go forward. My intuition is that large amounts of VACUUM work that
happen when an aggressive VACUUM is triggered are so much more painful
for users than an ordinary non-aggressive VACUUM being a bit more
expensive that this should be a win overall even if the total effort
expended is a bit greater than now -- but my intuition is sometimes
wrong. I don't think we're going to find out without something being
committed, though.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Eagerly scan all-visible pages to amortize aggressive vacuum

Reply via email to