Hi, On 2020-09-09 10:14:04 -0400, Stephen Frost wrote: > > I've been toying with a patch that introduces more smarts about when a > > row is removable, by looking more closely whether a specific row > > versions are visible (e.g. in the common case of one old snapshot and > > lots of newer rows). But that's orders of magnitude more complicated. So > > going for something as simple as this seems like a good idea. > > I've wondered about this for a long time- very cool that you've found > time to actually work on a patch. A couple of different ideas were > discussed previously about how to do that kind of a check- mind talking > about what method you're using, or perhaps just sharing that patch? :)
It's very very early, and it doesn't really work. I basically tried to just plug a bit more intelligence into the dead tuple detection (which now can easily store more and incrementally built state with the recent changes for snapshot scalability). Detection that tuples newer than the horizon are dead isn't that complicated - what's hard is not breaking things due to ctid chains lacking intermediate versions. To avoid that I had to restrict it to inserted (not updated) tuples that were subsequently deleted. And my heuristic only supported only one old snapshot. Building a bsearchable list of ranges of valid (xmin-xmax] ranges isn't that hard. Some care needs to be taken to make the list non-overlapping, but that's easy enough by just merging entries. Obviously lookup in such a more complicated structure isn't free. Nor is building it. So we'd need some heuristics about when to do so. It'd probably be OK to occasionally look at the age of the oldest in-progress statement, to infer the time for old snapshots based on that. And then we could have a GUC that says when it's worth doing more complicated lookups. I don't have a handle on how to deal with the ctid chaining for intermediate row versions. Greetings, Andres Freund