On Mon, Aug 29, 2022 at 11:47 AM Jeff Davis <pg...@j-davis.com> wrote: > Sounds like a good goal, and loosely follows the precedent of > checkpoint targets and vacuum cost delays.
Right. > Why is the threshold per-table? Imagine someone who has a bunch of 4GB > partitions that add up to a huge amount of deferred freezing work. I think it's possible that our cost model will eventually become very sophisticated, and weigh all kinds of different factors, and work as one component of a new framework that dynamically schedules autovacuum workers. My main goal in posting this v1 was validating the *general idea* of strategies with cost models, and the related question of how we might use VM snapshots for that. After all, even the basic concept is totally novel. > The initial problem you described is a system-level problem, so it > seems we should track the overall debt in the system in order to keep > up. I agree that the problem is fundamentally a system-level problem. One reason why vacuum_freeze_strategy_threshold works at the table level right now is to get the ball rolling. In any case the specifics of how we trigger each strategy are from from settled. That's not the only reason why we think about things at the table level in the patch set, though. There *are* some fundamental reasons why we need to care about individual tables, rather than caring about unfrozen pages at the system level *exclusively*. This is something that vacuum_freeze_strategy_threshold kind of gets right already, despite its limitations. There are 2 aspects of the design that seemingly have to work at the whole table level: 1. Concentration matters when it comes to wraparound risk. Fundamentally, each VACUUM still targets exactly one heap rel, and advances relfrozenxid at most once per VACUUM operation. While the total number of "unfrozen heap pages" across the whole database is the single most important metric, it's not *everything*. As a general rule, there is much less risk in having a certain fixed number of unfrozen heap pages spread fairly evenly among several larger tables, compared to the case where the same number of unfrozen pages are all concentrated in one particular table -- right now it'll often be one particular table that is far larger than any other table. Right now the pain is generally felt with large tables only. 2. We need to think about things at the table level is to manage costs *over time* holistically. (Closely related to #1.) The ebb and flow of VACUUM for one particular table is a big part of the picture here -- and will be significantly affected by table size. We can probably always afford to risk falling behind on freezing/relfrozenxid (i.e. we should prefer laziness) if we know that we'll almost certainly be able to catch up later when things don't quite work out. That makes small tables much less trouble, even when there are many more of them (at least up to a point). As you know, my high level goal is to avoid ever having to make huge balloon payments to catch up on freezing, which is a much bigger risk with a large table -- this problem is mostly a per-table problem (both now and in the future). A large table will naturally require fewer, larger VACUUM operations than a small table, no matter what approach is taken with the strategy stuff. We therefore have fewer VACUUM operations in a given week/month/year/whatever to spread out the burden -- there will naturally be fewer opportunities. We want to create the impression that each autovacuum does approximately the same amount of work (or at least the same per new heap page for large append-only tables). It also becomes much more important to only dirty each heap page during vacuuming ~once with larger tables. With a smaller table, there is a much higher chance that the pages we modify will already be dirty from user queries. > > for this table, at this time: Is it more important to advance > > relfrozenxid early (be eager), or to skip all-visible pages instead > > (be lazy)? If it's the former, then we must scan every single page > > that isn't all-frozen according to the VM snapshot (including every > > all-visible page). > > This feels too absolute, to me. If the goal is to freeze more > incrementally, well in advance of wraparound limits, then why can't we > just freeze 1000 out of 10000 freezable pages on this run, and then > leave the rest for a later run? My remarks here applied only to the question of relfrozenxid advancement -- not to freezing. Skipping strategy (relfrozenxid advancement) is a distinct though related concept to freezing strategy. So I was making a very narrow statement about invariants/basic correctness rules -- I wasn't arguing against alternative approaches to freezing beyond the 2 freezing strategies (not to be confused with skipping strategies) that appear in v1. That's all I meant -- there is definitely no point in scanning only a subset of the table's all-visible pages, as far as relfrozenxid advancement is concerned (and skipping strategy is fundamentally a choice about relfrozenxid advancement vs work avoidance, eagerness vs laziness). Maybe you're right that there is room for additional freezing strategies, besides the two added by v1-0003-*patch. Definitely seems possible. The freezing strategy concept should be usable as a framework for adding additional strategies, including (just for example) a strategy that decides ahead of time to freeze only so many pages, though not others (without regard for the fact that the pages that we are freezing may not be very different to those we won't be freezing in the current VACUUM). I'm definitely open to that. It's just a matter of characterizing what set of workload characteristics this third strategy would solve, how users might opt in or opt out, etc. Both the eager and the lazy freezing strategies are based on some notion of what's important for the table, based on its known characteristics, and based on what seems like to happen to the table in the future (the next VACUUM, at least). I'm not completely sure how many strategies we'll end up needing. Though it seems like the eager/lazy trade-off is a really important part of how these strategies will need to work, in general. (Thinks some more) I guess that such an alternative freezing strategy would probably have to affect the skipping strategy too. It's tricky to tease apart because it breaks the idea that skipping strategy and freezing strategy are basically distinct questions. That is a factor that makes it a bit more complicated to discuss. In any case, as I said, I have an open mind about alternative freezing strategies beyond the 2 basic lazy/eager freezing strategies from the patch. > What if we thought about this more like a "background freezer". It > would keep track of the total number of unfrozen pages in the system, > and freeze them at some kind of controlled/adaptive rate. I like the idea of storing metadata in shared memory. And scheduling and deprioritizing running autovacuums. Being able to slow down or even totally halt a given autovacuum worker without much consequence is enabled by the VM snapshot concept. That said, this seems like future work to me. Worth discussing, but trying to keep out of scope in the first version of this that is committed. > Regular autovacuum's job would be to keep advancing relfrozenxid for > all tables and to do other cleanup, and the background freezer's job > would be to keep the absolute number of unfrozen pages under some > limit. Conceptually those two jobs seem different to me. The problem with making it such a sharp distinction is that it can be very useful to manage costs by making it the job of VACUUM to do both -- we can avoid dirtying the same page multiple times. I think that we can accomplish the same thing by giving VACUUM more freedom to do either more or less work, based on the observed characteristics of the table, and some sense of how costs will tend to work over time. across multiple distinct VACUUM operations. In practice that might end up looking very similar to what you describe. It seems undesirable for VACUUM to ever be too sure of itself -- the information that triggers autovacuum may not be particularly reliable, which can be solved to some degree by making as many decisions as possible at runtime, dynamically, based on the most authoritative and recent information. Delaying committing to one particular course of action isn't always possible, but when it is possible (and not too expensive) we should do it that way on general principle. > Also, regarding patch v1-0001-Add-page-level-freezing, do you think > that narrows the conceptual gap between an all-visible page and an all- > frozen page? Yes, definitely. However, I don't think that we can just get rid of the distinction completely -- though I did think about it for a while. For one thing we need to be able to handle cases like the case where heap_lock_tuple() modifies an all-frozen page, and makes it all-visible without making it completely unskippable to every VACUUM operation. -- Peter Geoghegan