On Sat, 27 Apr 2024 at 02:13, Robert Haas <robertmh...@gmail.com> wrote: > Let's compare the current situation to the situation post-patch with a > cap of 500k. Consider a table 1024 times larger than the one I > mentioned above, so pgbench scale factor 25600, size on disk 320GB. > Currently, that table will be vacuumed for bloat when the number of > dead tuples exceeds 20% of the table size, because that's the default > value of autovacuum_vacuum_scale_factor. The table has 2.56 billion > tuples, so that means that we're going to vacuum it when there are > more than 510 million dead tuples. Post-patch, we will vacuum when we > have 500 thousand dead tuples. Suppose a uniform workload that slowly > updates rows in the table. If we were previously autovacuuming the > table once per day (1440 minutes) we're now going to try to vacuum it > almost every minute (1440 minutes / 1024 = 84 seconds).
I've not checked your maths, but if that's true, that's not going to work. I think there are fundamental problems with the parameters that drive autovacuum that need to be addressed before we can consider a patch like this one. Here are some of the problems that I know about: 1. Autovacuum has exactly zero forward vision and operates reactively rather than proactively. This "blind operating" causes tables to either not need vacuumed or suddenly need vacuumed without any consideration of how busy autovacuum is at that current moment. 2. There is no prioritisation for the order in which tables are autovacuumed. 3. With the default scale factor, the larger a table becomes, the more infrequent the autovacuums. 4. Autovacuum is more likely to trigger when the system is busy because more transaction IDs are being consumed and there is more DML occurring. This results in autovacuum having less work to do during quiet periods when there are more free resources to be doing the vacuum work. In my opinion, the main problem with Frédéric's proposed GUC/reloption is that it increases the workload that autovacuum is responsible for and, because of #2, it becomes more likely that autovacuum works on some table that isn't the highest priority table to work on which can result in autovacuum starvation of tables that are more important to vacuum now. I think we need to do a larger overhaul of autovacuum to improve points 1-4 above. I also think that there's some work coming up that might force us into this sooner than we think. As far as I understand it, AIO will break vacuum_cost_page_miss because everything (providing IO keeps up) will become vacuum_cost_page_hit. Maybe that's not that important as that costing is quite terrible anyway. Here's a sketch of an idea that's been in my head for a while: Round 1: 1a) Give autovacuum forward vision (#1 above) and instead of vacuuming a table when it (atomically) crosses some threshold, use the existing scale_factors and autovacuum_freeze_max_age to give each table an autovacuum "score", which could be a number from 0-100, where 0 means do nothing and 100 means nuclear meltdown. Let's say a table gets 10 points for the dead tuples meeting the current scale_factor and maybe an additional point for each 10% of proportion the size of the table is according to the size of the database (gives some weight to space recovery for larger tables). For relfrozenxid, make the score the maximum of dead tuple score vs the percentage of the age(relfrozenxid) is to 2 billion. Use a similar maximum score calc for age(relminmxid) 2 billion. 1b) Add a new GUC that defines the minimum score a table must reach before autovacuum will consider it. 1c) Change autovacuum to vacuum the tables with the highest scores first. Round 2: 2a) Have autovacuum monitor the score of the highest scoring table over time with buckets for each power of 2 seconds in history from now. Let's say 20 buckets, about 12 days of history. Migrate scores into older buckets to track the score over time. 2b) Have autovacuum cost limits adapt according to the history so that if the maximum score of any table is trending upwards, that autovacuum speeds up until the score buckets trend downwards towards the present. 2c) Add another GUC to define the minimum score that autovacuum will be "proactive". Must be less than the minimum score to consider autovacuum (or at least, ignored unless it is.). This GUC would not cause an autovacuum speedup due to 2b) as we'd only consider tables which meet the GUC added in 1b) in the score history array in 2a). This stops autovacuum running faster than autovacuum_cost_limit when trying to be proactive. While the above isn't well a well-baked idea. The exact way to calculate the scores isn't well thought through, certainly. However, I do think it's an idea that we should consider and improve upon. I believe 2c) helps solve the problem of large tables becoming bloated as autovacuum could get to these sooner when the workload is low enough for it to run proactively. I think we need at least 1a) before we can give autovacuum more work to do, especially if we do something like multiply its workload by 1024x, per your comment above. David