Hi, On 2021-04-22 11:30:21 -0700, Peter Geoghegan wrote: > I think that you're both missing very important subtleties here. > Apparently the "quantitative vs qualitative" distinction I like to > make hasn't cleared it up.
I'm honestly getting a bit annoyed about this stuff. Yes it's a cool improvement, but no, it doesn't mean that there aren't still relevant issues in important cases. It doesn't help that you repeatedly imply that people that don't see it your way need to have their view "cleared up". "Bottom up index deletion" is practically *irrelevant* for a significant set of workloads. > You both seem to be assuming that everything would be fine if you > could somehow inexpensively know the total number of undeleted dead > tuples in each index at all times. I don't think we'd need an exact number. Just a reasonable approximation so we know whether it's worth spending time vacuuming some index. > But I don't think that that's true at all. I don't mean that it might > not be true. What I mean is that it's usually a meaningless number *on > its own*, at least if you assume that every index is either an nbtree > index (or an index that uses some other index AM that has the same > index deletion capabilities). You also have to assume that you have roughly evenly distributed index insertions and deletions. But workloads that insert into some parts of a value range and delete from another range are common. I even would say that *precisely* because "Bottom up index deletion" can be very efficient in some workloads it is useful to have per-index stats determining whether an index should be vacuumed or not. > My mental models for index bloat usually involve imagining an > idealized version of a real world bloated index -- I compare the > empirical reality against an imagined idealized version. I then try to > find optimizations that make the reality approximate the idealized > version. Say a version of the same index in a traditional 2PL database > without MVCC, or in real world Postgres with VACUUM that magically > runs infinitely fast. > > Bottom-up index deletion usually leaves a huge number of > undeleted-though-dead index tuples untouched for hours, even when it > works perfectly. 10% - 30% of the index tuples might be > undeleted-though-dead at any given point in time (traditional B-Tree > space utilization math generally ensures that there is about that much > free space on each leaf page if we imagine no version churn/bloat -- > we *naturally* have a lot of free space to work with). These are > "Schrodinger's dead index tuples". You could count them > mechanistically, but then you'd be counting index tuples that are > "already dead and deleted" in an important theoretical sense, despite > the fact that they are not yet literally deleted. That's why bottom-up > index deletion frequently avoids 100% of all unnecessary page splits. > The asymmetry that was there all along was just crazy. I merely had > the realization that it was there and could be exploited -- I didn't > create or invent the natural asymmetry. Except that heap bloat not index bloat might be the more pressing concern. Or that there will be no meaningful amount of bottom-up deletions. Or ... Greetings, Andres Freund