On Wed, Apr 29, 2020 at 12:54 PM Andres Freund <and...@anarazel.de> wrote: > > Fundamentally, btvacuumpage() doesn't freeze 32-bit XIDs (from > > bpto.xact) when it recycles deleted pages. It simply puts them in the > > FSM without changing anything about the page itself. This means > > surprisingly little in the context of nbtree: the > > _bt_page_recyclable() XID check that takes place in btvacuumpage() > > also takes place in _bt_getbuf(), at the point where the page actually > > gets recycled by the client. That's not great. > > I think it's quite foolish for btvacuumpage() to not freeze xids. If we > only do so when necessary (i.e. older than a potential new relfrozenxid, > and only when the vacuum didn't yet skip pages), the costs are pretty > miniscule.
I wonder if we should just bite the bullet and mark pages that are recycled by VACUUM as explicitly recycled, with WAL-logging and everything (this is like freezing, but stronger). That way, the _bt_page_recyclable() call within _bt_getbuf() would only be required to check that state (while btvacuumpage() would use something like a _bt_page_eligible_for_recycling(), which would do the same thing as the current _bt_page_recyclable()). I'm not sure how low the costs would be, but at least we'd only have to do it once per already-deleted page (i.e. only the first time a VACUUM operation found _bt_page_eligible_for_recycling() returned true for the page and marked it recycled in a crash safe manner). That design would be quite a lot simpler, because it expresses the problem in terms that make sense to the nbtree code. _bt_getbuf() should not have to make a distinction between "possibly recycled" versus "definitely recycled". It makes sense that the FSM is not crash safe, I suppose, but we're talking about relatively large amounts of free space here. Can't we just do it properly/reliably? -- Peter Geoghegan