Re: cannot freeze committed xmax

Mark Dilger Wed, 28 Oct 2020 09:22:21 -0700

> On Oct 28, 2020, at 8:56 AM, Konstantin Knizhnik <[email protected]> 
> wrote:
> 
> 
> 
> On 28.10.2020 18:25, Mark Dilger wrote:
>> 
>>> On Oct 28, 2020, at 6:44 AM, Konstantin Knizhnik 
>>> <[email protected]> wrote:
>>> 
>>> Looks like there is no assumption that xmax should be set to 
>>> InvalidTransactionId when HEAP_XMAX_INVALID bit is set.
>>> And I didn't find any check  preventing cutoff_xid to be greater than XID 
>>> of some transaction which was aborted long time ago.
>> Nothing in your example suggests that cutoff_xid is wrong, so I'd assume 
>> that part is probably working ok.
>> 
>> Your data shows that HEAP_XMAX_INVALID and HEAP_UPDATED flags are both set.  
>> That should only happen if the updating transaction aborted.  But the query 
>> of clog is saying that it committed. Something is wrong with that.  How did 
>> the hint bits get set to HEAP_XMAX_INVALID if the transaction did commit.  
>> Either some process is setting that hint bit when it shouldn't, or your clog 
>> is corrupted and returning a bogus answer about the xmax having been 
>> committed.  Either way, you've got corruption.
>> 
>> Your question "preventing cutoff_xid to be greater than XID of some 
>> transaction which was aborted long time ago" seems to be ignoring that 
>> TransactionIdDidCommit(xid) is returning true, suggesting the transaction 
>> did not abort.
>> 
>> —
>> Mark Dilger
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
> Yes, I forgot to say that transaction is treated as committed (txid_status() 
> returns "committed").
> Also database was previously upgraded from 11.5 to 11.7
> Certainly the hypothesis of CLOG corruption explains everything.
> I wonder if there can be some other scenario (upgrade, multixacts, previous 
> freeze attempt...) which can cause such combination of flags?
> I have inspected all cases where HEAP_XMAX_INVALID is set, but have not found 
> any one which can explain it.

The other possibillity is that this tuple is erroneously marked as 
HEAP_UPDATED.  heap_update() sets that, which makes sense.  
rewrite_heap_tuple() copies the old tuple's bits to the new tuple and then does 
some work to resolve update chains.  I guess you could look at whether that 
logic might leave things in an invalid state.  I don't have any theory about 
that.

Looking at the git logs, it seems 699bf7d05c68734f800052829427c20674eb2c6b 
introduced the check that is ereporting, and did so along with commit 
9c2f0a6c3cc8bb85b78191579760dbe9fb7814ec, which cleaned up some corruption 
bugs.  I wonder if you're just unlucky enough to have had one of these 
corruptions, and now you're bumping into the ereport which is intended to 
prevent the corruption from spreading further?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Re: cannot freeze committed xmax

Reply via email to