> On Mar 16, 2021, at 11:40 AM, Robert Haas <robertmh...@gmail.com> wrote:
>
> On Tue, Mar 16, 2021 at 2:22 PM Tom Lane <t...@sss.pgh.pa.us> wrote:
>> I'm circling back around to the idea that amcheck is trying to
>> validate TOAST references that are already dead, and it's getting
>> burnt because something-or-other has already removed the toast
>> rows, though not the referencing datums. That's legal behavior
>> once the rows are marked dead. Upthread it was claimed that
>> amcheck isn't doing that, but this looks like a smoking gun to me.
>
> I think this theory has some legs. From check_tuple_header_and_visibilty():
>
> else if (!(infomask & HEAP_XMAX_COMMITTED))
> return true; /*
> HEAPTUPLE_DELETE_IN_PROGRESS or
> *
> HEAPTUPLE_LIVE */
> else
> return false; /*
> HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
> }
> return true; /* not dead */
> }
>
> That first case looks wrong to me. Don't we need to call
> get_xid_status() here, Mark? As coded, it seems that if the xmin is ok
> and the xmax is marked committed, we consider the tuple checkable. The
> comment says it must be HEAPTUPLE_DELETE_IN_PROGRESS or
> HEAPTUPLE_LIVE, but it seems to me that if the actual answer is either
> HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD depending on whether the
> xmax is all-visible. And in the second case the comment says it's
> either HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD, but I think in that
> case it's either HEAPTUPLE_DELETE_IN_PROGRESS or
> HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD, depending on the XID
> status.
>
> Another thought here is that it's probably not wicked smart to be
> relying on the hint bits to match the actual status of the tuple in
> cases of corruption. Maybe we should be warning about tuples that are
> have xmin or xmax flagged as committed or invalid when in fact clog
> disagrees. That's not a particularly uncommon case, and it's hard to
> check.
This code was not committed as part of the recent pg_amcheck work, but longer
ago, and I'm having trouble reconstructing exactly why it was written that way.
Changing check_tuple_header_and_visibilty() fixes the regression test and also
manual tests against the "regression" database that I've been using. I'd like
to ponder the changes a while longer before I post, but the fact that these
changes fix the tests seems to add credibility to this theory.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company