On Wed, Sep 9, 2020 at 11:02 AM Andres Freund <and...@anarazel.de> wrote: > Obviously lookup in such a more complicated structure isn't free. Nor is > building it. So we'd need some heuristics about when to do so. It'd > probably be OK to occasionally look at the age of the oldest in-progress > statement, to infer the time for old snapshots based on that. And then > we could have a GUC that says when it's worth doing more complicated > lookups. > > I don't have a handle on how to deal with the ctid chaining for > intermediate row versions.
I wonder if it makes sense to add the feature to nbtree first. This side-steps the practical problem of having to figure out ctid chaining. You can prove the idea in a relatively low risk way, while still creating significant value for users. We can reason about versioning within indexes without having authoritative information about it close at hand. The trick is to delay everything until it looks like we'll have to split the page. I can see very substantial improvements to index bloat from non-HOT updates with the deduplication-deletion patch I've been working on: https://postgr.es/m/cah2-wznjqqwspsxbiz6wqybikqfmfdjodeqp28otqw551t7...@mail.gmail.com Importantly, the patch tends to bound the number of distinct index entries *per logical row* in the presence of non-HOT updates (at least for indexes that are not logically modified by the update). ISTM that this is what really matters. The patch doesn't just reduce index bloat -- it greatly caps the number of heap pages any given primary key point lookup query will do. So in practice we eagerly kill precisely the garbage index tuples that would have caused us the most trouble without the new mechanism in place. Avoiding a page split is a big win, so we can probably justify relatively expensive lookup calls to make this work. This seems especially likely to be true because we can probably ratchet up to that only when we see that it will win. If a unique index has 10 entries for one value (10 versions), which is inevitable once HOT pruning fails, then how many of those 10 do we really need? The answer is perhaps just 2 or 3 maybe 99%+ of the time. I strongly suspect that preventing page splits is more valuable than opportunistic heap pruning (unless the pruning itself prevents page splits, which it may or may not). Logically unnecessary page splits have obvious lasting consequences, are usually highly correlated, and affect performance in ways that are non-linear and likely very harmful in the real world. -- Peter Geoghegan