On Wed, Apr 29, 2020 at 12:54 PM Andres Freund <and...@anarazel.de> wrote:
> > Fundamentally, btvacuumpage() doesn't freeze 32-bit XIDs (from
> > bpto.xact) when it recycles deleted pages. It simply puts them in the
> > FSM without changing anything about the page itself. This means
> > surprisingly little in the context of nbtree: the
> > _bt_page_recyclable() XID check that takes place in btvacuumpage()
> > also takes place in _bt_getbuf(), at the point where the page actually
> > gets recycled by the client. That's not great.
>
> I think it's quite foolish for btvacuumpage() to not freeze xids. If we
> only do so when necessary (i.e. older than a potential new relfrozenxid,
> and only when the vacuum didn't yet skip pages), the costs are pretty
> miniscule.

I wonder if we should just bite the bullet and mark pages that are
recycled by VACUUM as explicitly recycled, with WAL-logging and
everything (this is like freezing, but stronger). That way, the
_bt_page_recyclable() call within _bt_getbuf() would only be required
to check that state (while btvacuumpage() would use something like a
_bt_page_eligible_for_recycling(), which would do the same thing as
the current _bt_page_recyclable()).

I'm not sure how low the costs would be, but at least we'd only have
to do it once per already-deleted page (i.e. only the first time a
VACUUM operation found _bt_page_eligible_for_recycling() returned true
for the page and marked it recycled in a crash safe manner). That
design would be quite a lot simpler, because it expresses the problem
in terms that make sense to the nbtree code. _bt_getbuf() should not
have to make a distinction between "possibly recycled" versus
"definitely recycled".

It makes sense that the FSM is not crash safe, I suppose, but we're
talking about relatively large amounts of free space here. Can't we
just do it properly/reliably?

-- 
Peter Geoghegan


Reply via email to