On Wed, Feb 9, 2022 at 6:13 AM Robert Haas <robertmh...@gmail.com> wrote: > Just to be clear, when I say that the dead index tuples don't matter > here, I mean from the point of view of the index. From the point of > view of the table, the presence of dead index tuples (or even the > potential presence of dead tuples) pointing to dead line pointers is > an issue that can drive heap bloat. But from the point of view of the > index, because we don't ever merge sibling index pages, and because we > have kill_prior_tuple, there's not much value in freeing up space in > index pages unless it either prevents a split or lets us free the > whole page. So I agree with Peter that index growth is what really > matters.
One small caveat that I'd add is this: heap fragmentation likely makes it harder to avoid page splits in indexes, to a degree. It is arguably one cause of the page splits that do happen in a table like pgbench_tellers, with standard pgbench (and lots of throughput, lots of clients). The tellers (and also branches) primary key tends to double in size in my recent tests (still way better than a 20x increase in size, which is what happened in Postgres 11 and maybe even 13). I think that it might be possible to perfectly preserve the original index size (even with ANALYZE running now and again) by setting heap fill factor very low, maybe 50 or less. This is a minor quibble, though. It still makes sense to think of heap fragmentation as a problem for the heap itself, and not for indexes at all, since the effect I describe is relatively insignificant, and just about impossible to model. The problem really is that the heap pages are failing to hold their original logical rows in place -- the index size issue is more of a symptom than a problem unto itself. > However, I have a concern that Peter's idea to use the previous index > growth to drive future index vacuuming distinction is retrospective > rather than prospective. If the index is growing more than it should > based on the data volume, then evidently we didn't do enough vacuuming > at some point in the past. That's a very valid concern. As you know, the great advantage about retrospectively considering what's not going well (and reducing everything to some highly informative measure like growth in index size) is that it's reliable -- you don't have to understand precisely how things got that way, which is just too complicated to get right. And as you point out, the great disadvantage is that it has already happened -- which might already be too late. More on that later... > It's reasonable to step up our efforts in > the present to make sure that the problem doesn't continue, but in > some sense it's already too late. What we would really like is a > measure that answers the question: is the index going to bloat in the > relatively near future if we don't vacuum it now? I think that the > dead tuple count is trying, however imperfectly, to figure that out. > All other things being equal, the more dead tuples there are in the > index, the more bloat we're going to have later if we don't clean them > out now. One of the key intuitions behind bottom-up index deletion is to treat the state of an index page as a dynamic thing, not a static thing. The information that we take from the page that drives our decisions is very reliable on average, over time, in the aggregate. At the same time, the information is very noisy, and could be wrong in important ways at just about any time. The fundamental idea was to take advantage of the first property, without ever getting killed by the second property. To me this seems conceptually similar to how one manages risk when playing a game of chance while applying the Kelly criterion. The criterion provides probabilistic certainty on the best strategy to use in a situation where we have known favorable odds, an infinite series of bets, and a personal bankroll. I recently came across a great blog post about it, which gets the idea across well: https://explore.paulbutler.org/bet/ If you scroll down to the bottom of the page, there are some general conclusions, some of which are pretty counterintuitive. Especially "Maximizing expected value can lead to a bad long-run strategy". Growth over time is much more important than anything else, since you can play as many individual games as you like -- provided you never go bankrupt. I think that *a lot* of problems can be usefully analyzed that way, or at least benefit from a general awareness of certain paradoxical aspects of probability. Bankruptcy must be recognized as qualitatively different to any non-zero bankroll. A little bit like how splitting a leaf page unnecessarily is truly special. Once we simply avoid ruin, we can get the benefit of playing a game with favorable odds -- growth over time is what matters, not necessarily our current bankroll. It sounds like a fairly small difference, but that's deceptive -- it's a huge difference. > The problem is not with that core idea, which IMHO is actually good, > but that all other things are NOT equal. ...now to get back to talking about VACUUM itself, and to this project. I couldn't agree more -- all other things are NOT equal. We need a way to manage the risk that things could change quickly when an index that we believe has many dead tuples hasn't grown at all just yet. It's probably also true that we should try to predict the near future, and not 100% rely on the fact that what we've been doing seems to have worked so far -- I do accept that. We should probably dispense with the idea that we'll be making these decisions about what to do with an index like this (bloated in a way that bottom-up index deletion just won't help with) in an environment that is similar to how the current "skip index scan when # heap pages with one or more LP_DEAD items < 2% of rel_pages" thing. That mechanism has to be very conservative because we just don't know when the next opportunity to vacuum indexes will be -- we almost have to assume that the decision will be static, and made exactly once, so we better be defensive. But why should that continue to be true with the conveyor belt stuff in place, and with periodic mini-vacuums that coordinate over time? I don't think it has to be like that. We can make it much more dynamic. I can imagine a two-way dialog between the index and between vacuumlazy.c that takes place over time. The index AM might be able to report something along the lines of: "While I think that this index has more dead index tuples then it really should, the fact is that it hasn't grown at all, even by one single leaf page. And so don't you [vacuumlazy.c] should not make me vacuum the index right now. But be careful -- check back in again in another minute or two, because the situation must be assumed to be pretty volatile for now." This is just an example, not a concrete design. My point is that approaching the problem dynamically makes it *vastly* easier to do the right thing. It's far easier to manage the consequences of being wrong than it is to be right all the time. We're going to be wrong anyway, so better to be wrong on our own terms. > I'm not hung up on using the # of dead tuples specifically as the > metric for index vacuuming, and it may be best to pick some other > measure. But I am a little suspicious that if the only measure is past > index growth, we will let some situations go too far before we wake up > and do something about them. My intuition is that it would be a good > idea to come up with something we could measure, even if it's > imperfect, that would give us some clue that trouble is brewing before > pages actually start splitting. Now maybe my intuition is wrong and > there is nothing better, but I think it's worth a thought. We will need something like that. I think that LP_DEAD items (or would-be LP_DEAD items -- tuples with storage that would get pruned into LP_DEAD items if we were to prune) in the table are much more interesting than dead heap-only tuples, and also more interesting that dead index tuples. Especially the distribution of such LP_DEAD items in the table, and their concentration. That does seem much more likely to be robust as a quantitative driver of index vacuuming. In the extreme case when there are a huge amount of LP_DEAD items in the table, then we're going to want to make them LP_UNUSED anyway, which implies that we'll do index vacuuming to make it safe. Since that's already going to be true, maybe we should try to find a way to usefully scale the behavior, so that maybe some indexes are vacuumed sooner when the number of LP_DEAD items is increasing. Not really sure, but that seems more promising than anything else. -- Peter Geoghegan