On Thu, Jan 23, 2025 at 12:16 PM Robert Haas <robertmh...@gmail.com> wrote: > > On Wed, Jan 22, 2025 at 5:48 PM Melanie Plageman > <melanieplage...@gmail.com> wrote: > > Circling back to benchmarking, I've been running the most adversarial > > benchmarks I could devise and can share a bit of what I've found. > > > > I created a "hot tail" benchmark where 16 clients insert some data and > > then update some data older than what they just inserted but still > > towards the end of the relation. The adversarial part is that I bulk > > delete all the data older than X hours where X hours is always after > > the data is eligible to be frozen but before it would be aggressively > > vacuumed. > > > > That means that there are a bunch of pages that will never be frozen > > on master but are frozen with the patch -- wasting vacuum resources. I > > tuned vacuum_freeze_min_age and vacuum_freeze_table_age and picked the > > DELETE window to specifically have this behavior. > > > > With my patch, I do see a 15-20% increase in the total time spent > > vacuuming over the course of the multi-hour benchmark. (I only see a > > 1% increase in the total WAL volume, though.) > > How much time is that in absolute terms? If the benchmark runs for 3 > hours and during that time we have 1 autovacuum worker active for 30 > minutes out of those 3 hours, that is different than if we have 5 > autovacuum workers active nearly all the time. Or, maybe a clearer way > to put it, what percentage of the total work was the VACUUM work? If > the total work was $100 and the VACUUM work accounted for $80 of that, > then a 15-20% increase is pretty significant; if the total work was > $100 and the VACUUM work accounted for $5 of that, then a 15-20% > increase matters a lot less.
So, in this case, there is only one table in question, so 1 autovacuum worker (and up to 2 maintenance parallel workers for index vacuuming). The duration I provided is just the absolute duration from start of vacuum to finish -- not considering the amount of time each parallel worker may have been working (also it includes time spent delaying). The benchmark ran for 2.8 hours. I configured vacuum to run frequently. In this case, master spent 47% of the total time vacuuming and the patch spent 56%. There was a fair amount of run-to-run variance which is down to vacuum and checkpoint timing. Andres suggested off-list that I rerun the benchmarks with checksums and FPIs disabled to remove some of this variation. That, of course, won't give accurate total time numbers but it should make the proportional increase more stable. I'll share some results once I've done this. > But tentatively I'm inclined to say this is fine. Inherent in frontloading work is wasting it if it turns out the work is unneeded. Unneeded work is from one of two sources 1) we failed to freeze the page or 2) we succeed in freezing the page but then the page is unfrozen before the next aggressive vacuum. Avoiding 1 would require knowledge about the distribution of page ages throughout the relation that we decided was too expensive to get and store. Avoiding 2 would require prescience about the future of the workload. We found building and storing a model to make predictions like that too complicated, error-prone, and expensive. I think your algorithm is the best we can do if we won't do either 1 or 2. If this general kind of algorithm is the best we can do, then the only levers we have are changing the caps for success and failure or changing when we try to eager scan. I suspect the latter won't make much of a difference (i.e. some ways will work better for some workloads and worse for others). For the former, we have choices about what we make configurable (e.g. success, failure, region size). As for choosing the defaults, I did some experimentation, and it went pretty much as expected -- raising the failure cap or success cap results in more unneeded work and lowering it results in less. There are some workloads where there is a perfect point which results in the least unneeded work for the most benefit -- but that point is totally different for different workloads and configurations of Postgres. > Some of the work > that VACUUM is doing is probably work that otherwise would have needed > to happen in the foreground. For instance, the speedup in DELETEs that > you observed might be not only because the pages are cached but > perhaps also because DELETE doesn't need to do any non-trivial > visibility checks. Also, users don't have to anti-optimize their > configuration settings for their workload as you did when constructing > the adversarial case. Yes, like I hope users won't tune their vacuum_freeze_min_age very low if they are going to delete all data older than an hour. The benchmarks were mostly to look for unexpected interactions -- of which the DELETE performance was one. But other things, like a massive increase in WAL volume did not happen. - Melanie