> On Mar 16, 2021, at 11:40 AM, Robert Haas <robertmh...@gmail.com> wrote:
> 
> On Tue, Mar 16, 2021 at 2:22 PM Tom Lane <t...@sss.pgh.pa.us> wrote:
>> I'm circling back around to the idea that amcheck is trying to
>> validate TOAST references that are already dead, and it's getting
>> burnt because something-or-other has already removed the toast
>> rows, though not the referencing datums.  That's legal behavior
>> once the rows are marked dead.  Upthread it was claimed that
>> amcheck isn't doing that, but this looks like a smoking gun to me.
> 
> I think this theory has some legs. From check_tuple_header_and_visibilty():
> 
>                else if (!(infomask & HEAP_XMAX_COMMITTED))
>                        return true;            /*
> HEAPTUPLE_DELETE_IN_PROGRESS or
>                                                                 *
> HEAPTUPLE_LIVE */
>                else
>                        return false;           /*
> HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
>        }
>        return true;                            /* not dead */
> }
> 
> That first case looks wrong to me. Don't we need to call
> get_xid_status() here, Mark? As coded, it seems that if the xmin is ok
> and the xmax is marked committed, we consider the tuple checkable. The
> comment says it must be HEAPTUPLE_DELETE_IN_PROGRESS or
> HEAPTUPLE_LIVE, but it seems to me that if the actual answer is either
> HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD depending on whether the
> xmax is all-visible. And in the second case the comment says it's
> either HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD, but I think in that
> case it's either HEAPTUPLE_DELETE_IN_PROGRESS or
> HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD, depending on the XID
> status.
> 
> Another thought here is that it's probably not wicked smart to be
> relying on the hint bits to match the actual status of the tuple in
> cases of corruption. Maybe we should be warning about tuples that are
> have xmin or xmax flagged as committed or invalid when in fact clog
> disagrees. That's not a particularly uncommon case, and it's hard to
> check.

This code was not committed as part of the recent pg_amcheck work, but longer 
ago, and I'm having trouble reconstructing exactly why it was written that way.

Changing check_tuple_header_and_visibilty() fixes the regression test and also 
manual tests against the "regression" database that I've been using.  I'd like 
to ponder the changes a while longer before I post, but the fact that these 
changes fix the tests seems to add credibility to this theory.


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Reply via email to