On Thu, Apr 25, 2024 at 2:52 AM Frédéric Yhuel <frederic.yh...@dalibo.com> wrote: > > Le 24/04/2024 à 21:10, Melanie Plageman a écrit : > > On Wed, Apr 24, 2024 at 8:08 AM Frédéric Yhuel > > <frederic.yh...@dalibo.com> wrote: > >> > >> Hello, > >> > >> I would like to suggest a new parameter, autovacuum_max_threshold, which > >> would set an upper limit on the number of tuples to delete/update/insert > >> prior to vacuum/analyze. > > > > Hi Frédéric, thanks for the proposal! You are tackling a very tough > > problem. I would also find it useful to know more about what led you > > to suggest this particular solution. I am very interested in user > > stories around difficulties with what tables are autovacuumed and > > when. > > > > Hi Melanie! I can certainly start compiling user stories about that.
Cool! That would be very useful. > >> The idea would be to replace the following calculation : > >> > >> vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples; > >> > >> with this one : > >> > >> vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples / (1 > >> + vac_scale_factor * reltuples / autovacuum_max_threshold) > >> > >> (and the same for the others, vacinsthresh and anlthresh). > > > > My first thought when reviewing the GUC and how it is used is > > wondering if its description is a bit misleading. > > > > autovacuum_vacuum_threshold is the "minimum number of updated or > > deleted tuples needed to trigger a vacuum". That is, if this many > > tuples are modified, it *may* trigger a vacuum, but we also may skip > > vacuuming the table for other reasons or due to other factors. > > autovacuum_max_threshold's proposed definition is the upper > > limit/maximum number of tuples to insert/update/delete prior to > > vacuum/analyze. This implies that if that many tuples have been > > modified or inserted, the table will definitely be vacuumed -- which > > isn't true. Maybe that is okay, but I thought I would bring it up. > > > > I'm not too sure I understand. What are the reasons it might by skipped? > I can think of a concurrent index creation on the same table, or > anything holding a SHARE UPDATE EXCLUSIVE lock or above. Is this the > sort of thing you are talking about? No, I was thinking more literally that, if reltuples (assuming reltuples is modified/inserted tuples) > autovacuum_max_threshold, I would expect the table to be vacuumed. However, with your formula, that wouldn't necessarily be true. I think there are values of reltuples and autovacuum_max_threshold at which reltuples > autovacuum_max_threshold but reltuples <= vac_base_thresh + vac_scale_factor * reltuples / (1 + vac_scale_factor * reltuples / autovacuum_max_threshold) I tried to reduce the formula to come up with a precise definition of the range of values for which this is true, however I wasn't able to reduce it to something nice. Here is just an example of a case: vac_base_thresh = 2000 vac_scale_factor = 0.9 reltuples = 3200 autovacuum_max_threshold = 2500 total_thresh = vac_base_thresh + vac_scale_factor * reltuples / (1 + vac_scale_factor * reltuples / autovacuum_max_threshold) total_thresh: 3338. dead tuples: 3200. autovacuum_max_threshold: 2500 so there are more dead tuples than the max threshold, so it should trigger a vacuum, but it doesn't because the total calculated threshold is higher than the number of dead tuples. This of course may not be a realistic scenario in practice. It works best the closer scale factor is to 1 (wish I had derived the formula successfully) and when autovacuum_max_threshold > 2 * vac_base_thresh. So, maybe it is not an issue. > Perhaps a better name for the GUC would be > autovacuum_asymptotic_limit... or something like that? If we keep the asymptotic part, that makes sense. I wonder if we have to add another "vacuum" in there (e.g. autovacuum_vacuum_max_threshold) to be consistent with the other gucs. I don't really know why they have that extra "vacuum" in them, though. Makes the names so long. - Melanie