(I had missed this discussion due to the mismatched thread subject..) On Fri, May 14, 2021 at 11:14 AM Michael Paquier <mich...@paquier.xyz> wrote: > > On Thu, May 13, 2021 at 01:27:44PM -0700, Peter Geoghegan wrote: > > Almost all of the benefit of the optimization is available with the > > current BYPASS_THRESHOLD_PAGES threshold (2% of heap pages have > > LP_DEAD items), which has less risk than a higher threshold. I don't > > think it matters much if we have the occasional unnecessary round of > > index vacuuming on account of not applying the optimization. The truly > > important benefit of the optimization is to not do unnecessary index > > vacuuming all the time in the extreme cases where it's really hard to > > justify. There is currently zero evidence that anything higher than 2% > > will ever help anybody to an appreciably degree. (There is also no > > evidence that the optimization will ever need to be disabled, but I > > accept the need to be conservative and offer an off switch -- the > > precautionary principle applies when talking about new harms.) > > > > Not having to scan every index on every VACUUM, but only every 5th or > > so VACUUM is a huge improvement. But going from every 5th VACUUM to > > every 10th VACUUM? That's at best a tiny additional improvement in > > exchange for what I'd guess is a roughly linear increase in risk > > (maybe a greater-than-linear increase, even). That's an awful deal. > > Perhaps that's an awful deal, but based on which facts can you really > say that this new behavior of needing at least 2% of relation pages > with some dead items to clean up indexes is not a worse deal in some > cases? This may cause more problems for the in-core index AMs, as > much as it could impact any out-of-core index AM, no? What about > other values like 1%, or even 5%? My guess is that there would be an > ask to have more control on that, though that stands as my opinion.
I'm concerned how users can tune that scale type parameter that can be configurable between 0.0 and 0.05. I think that users basically don't pay attention to how many blocks are updated by UPDATE/DELETE. Unlike old vacuum_cleanup_index_scale_factor, increasing this parameter would directly affect index bloats. If the user can accept more index bloat to speed up (auto)vacuum, they can use vacuum_index_cleanup instead. I prefer to have an on/off switch just in case. I remember I also commented the same thing before. We’ve discussed a way to control whether or not to enable the skipping optimization by adding a new mode to INDEX_CLEANUP option, as Peter mentioned. For example, we can use the new mode “auto” (or “smart”) mode by default, enabling all skipping optimizations, and specifying “on” disables them. Or we can add “force” mode to disable all skipping optimizations while leaving the existing modes as they are. Anyway, I think it’s not a good idea to add a new GUC parameter that we’re not sure how to tune. IIUC skipping index vacuum when less than 2% of relation pages with at least one LP_DEAD is a table’s optimization rather than a btree index’s optimization. Since we’re not likely to set many pages all-visible or collect many dead tuples in that case, we can skip index vacuuming and table vacuuming. IIUC this case, fortunately, goes well together btree indexes’ bottom-up deletion. If this skipping behavior badly affects other indexes AMs, this optimization should be considered within btree indexes, although we will need a way for index AMs to consider and tell the vacuum strategy. But I guess this works well in most cases so having an on/off switch suffice. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/