On Tue, Aug 30, 2022 at 11:11 AM Jeff Davis <pg...@j-davis.com> wrote: > The solution involves more changes to the philosophy and mechanics of > vacuum than I would expect, though. For instance, VM snapshotting, > page-level-freezing, and a cost model all might make sense, but I don't > see why they are critical for solving the problem above.
I certainly wouldn't say that they're critical. I tend to doubt that I can be perfectly crisp about what the exact relationship is between each component in isolation and how it contributes towards addressing the problems we're concerned with. > I think I'm > still missing something. My mental model is closer to the bgwriter and > checkpoint_completion_target. That's not a bad starting point. The main thing that that mental model is missing is how the timeframes work with VACUUM, and the fact that there are multiple timeframes involved (maybe the system's vacuuming work could be seen as having one timeframe at the highest level, but it's more of a fractal picture overall). Checkpoints just don't take that long, and checkpoint duration has a fairly low variance (barring pathological performance problems). You only have so many buffers that you can dirty, too -- it's a self-limiting process. This is even true when (for whatever reason) the checkpoint_completion_target logic just doesn't do what it's supposed to do. There is more or less a natural floor on how bad things can get, so you don't have to invent a synthetic floor at all. LSM-based DB systems like the MyRocks storage engine for MySQL don't use checkpoints at all -- the closest analog is compaction, which is closer to a hybrid of VACUUM and checkpointing than anything else. The LSM compaction model necessitates adding artificial throttling to keep the system stable over time [1]. There is a disconnect between the initial ingest of data, and the compaction process. And so top-down modelling of costs and benefits with compaction is more natural with an LSM [2] -- and not a million miles from the strategy stuff I'm proposing. > Allow me to make a naive counter-proposal (not a real proposal, just so > I can better understand the contrast with your proposal): > I know there would still be some problem cases, but to me it seems like > we solve 80% of the problem in a couple dozen lines of code. It's not that this statement is wrong, exactly. It's that I believe that it is all but mandatory for me to ameliorate the downside that goes with more eager freezing, for example by not doing it at all when it doesn't seem to make sense. I want to solve the big problem of freeze debt, without creating any new problems. And if I should also make things in adjacent areas better too, so much the better. Why stop at a couple of dozens of lines of code? Why not just change the default of vacuum_freeze_min_age and vacuum_multixact_freeze_min_age to 0? > a. Can you clarify some of the problem cases, and why it's worth > spending more code to fix them? For one thing if we're going to do a lot of extra freezing, we really want to "get credit" for it afterwards, by updating relfrozenxid to reflect the new oldest extant XID, and so avoid getting an antiwraparound VACUUM early, in the near future. That isn't strictly true, of course. But I think that we at least ought to have a strong bias in the direction of updating relfrozenxid, having decided to do significantly more freezing in some particular VACUUM operation. > b. How much of your effort is groundwork for related future > improvements? If it's a substantial part, can you explain in that > larger context? Hard to say. It's true that the idea of VM snapshots is quite general, and could have been introduced in a number of different ways. But I don't think that that should count against it. It's also not something that seems contrived or artificial -- it's at least as good of a reason to add VM snapshots as any other I can think of. Does it really matter if this project is the freeze debt project, or the VM snapshot project? Do we even need to decide which one it is right now? > c. Can some of your patches be separated into independent discussions? > For instance, patch 1 has been discussed in other threads and seems > independently useful, and I don't see the current work as dependent on > it. I simply don't know if I can usefully split it up just yet. > Patch 4 also seems largerly independent. Patch 4 directly compensates for a problem created by the earlier patches. The patch series as a whole isn't supposed to amerliorate the problem of MultiXacts being allocated in VACUUM. It only needs to avoid making the situation any worse than it is today IMV (I suspect that the real fix is to make the VACUUM FREEZE command not tune vacuum_freeze_min_age). > d. Can you help give me a sense of scale of the problems solved by > visibilitymap snapshots and the cost model? Do those need to be in v1? I'm not sure. I think that having certainty that we'll be able to scan only so many pages up-front is very broadly useful, though. Plus it removes the SKIP_PAGES_THRESHOLD stuff, which was intended to enable relfrozenxid advancement in non-aggressive VACUUMs, but does so in a way that results in scanning many more pages needlessly. See commit bf136cf6, which added the SKIP_PAGES_THRESHOLD stuff back in 2009, shortly after the visibility map first appeared. Since relfrozenxid advancement fundamentally works at the table level, it seems natural to make it a top-down, VACUUM-level thing -- even within non-aggessive VACUUMs (I guess it already meets that description in aggressive VACUUMs). And since we really want to advance relfrozenxid when we do extra freezing (for the reasons I just went into), it seems natural to me to view it as one problem. I accept that it's not clear cut, though. [1] https://docs.google.com/presentation/d/1WgP-SlKay5AnSoVDSvOIzmu7edMmtYhdywoa0oAR4JQ/edit?usp=sharing [2] https://disc-projects.bu.edu/compactionary/research.html -- Peter Geoghegan