Hi, On 2021-06-16 13:04:07 -0400, Tom Lane wrote: > Yeah, I think this scenario of a few transactions with old snapshots > and the rest with very new ones could be improved greatly if we exposed > more info about backends' snapshot state than just "oldest xmin". But > that might be expensive to do.
I think it'd be pretty doable now. The snapshot scalability changes separated out information needed to do vacuuming / pruning (i.e. xmin) from the information needed to build a snapshot (xid, flags, subxids etc). Because xmin is not frequently accessed from other backends anymore, it is not important anymore to touch it as rarely as possible. From the cross-backend POV I think it'd be practically free to track a backend's xmax now. It's not quite as obvious that it'd essentially free to track a backend's xmax across all the snapshots it uses. I think we'd basically need a second pairingheap in snapmgr.c to track the "most advanced" xmax? That's *probably* fine, but I'm not 100% - Heikki wrote a faster heap implementation for snapmgr.c for a reason I assume. I think the hard part of this would be much more on the pruning / vacuum side of things. There's two difficulties: 1) Keeping it cheap to determine whether a tuple can be vacuumed, particularly while doing on-access pruning. This likely means that we'd only assemble the information to do visibility determination for rows above the "dead for everybody" horizon when encountering a sufficiently old tuple. And then we need a decent datastructure for checking whether an xid is in one of the "not needed" xid ranges. This seems solvable. 2) Modeling when it is safe to remove row versions. It is easy to remove a tuple that was inserted and deleted within one "not needed" xid range, but it's far less obvious when it is safe to remove row versions where prior/later row versions are outside of such a gap. Consider e.g. an update chain where the oldest snapshot can see one row version, then there is a chain of rows that could be vacuumed except for the old snapshot, and then there's a live version. If the old session updates the row version that is visible to it, it needs to be able to follow the xid chain. This seems hard to solve in general. It perhaps is sufficiently effective to remove row version chains entirely within one removable xid range. And it'd probably doable to also address the case where a chain is larger than one range, as long as all the relevant row versions are within one page: We can fix up the ctids of older still visible row versions to point to the successor of pruned row versions. But I have a hard time seeing a realistic approach to removing chains that span xid ranges and multiple pages. The locking and efficiency issues seem substantial. Greetings, Andres