Hi,

In a development branch of mine Thomas / the CF bot found a relatively
rare regression failures. That turned out to be because there was an
edge case in which heap_page_prune() was a bit more pessimistic than
lazy_scan_heap(). But I wonder if this isn't an issue more broadly:

The issue I am concerned about is lazy_scan_heap()'s logic for DEAD HOT
updated tuples:

                                        /*
                                         * Ordinarily, DEAD tuples would have 
been removed by
                                         * heap_page_prune(), but it's possible 
that the tuple
                                         * state changed since 
heap_page_prune() looked.  In
                                         * particular an INSERT_IN_PROGRESS 
tuple could have
                                         * changed to DEAD if the inserter 
aborted.  So this
                                         * cannot be considered an error 
condition.
                                         *
                                         * If the tuple is HOT-updated then it 
must only be
                                         * removed by a prune operation; so we 
keep it just as if
                                         * it were RECENTLY_DEAD.  Also, if 
it's a heap-only
                                         * tuple, we choose to keep it, because 
it'll be a lot
                                         * cheaper to get rid of it in the next 
pruning pass than
                                         * to treat it like an indexed tuple. 
Finally, if index
                                         * cleanup is disabled, the second heap 
pass will not
                                         * execute, and the tuple will not get 
removed, so we must
                                         * treat it like any other dead tuple 
that we choose to
                                         * keep.
                                         *
                                         * If this were to happen for a tuple 
that actually needed
                                         * to be deleted, we'd be in trouble, 
because it'd
                                         * possibly leave a tuple below the 
relation's xmin
                                         * horizon alive.  
heap_prepare_freeze_tuple() is prepared
                                         * to detect that case and abort the 
transaction,
                                         * preventing corruption.
                                         */
                                        if (HeapTupleIsHotUpdated(&tuple) ||
                                                HeapTupleIsHeapOnly(&tuple) ||
                                                params->index_cleanup == 
VACOPT_TERNARY_DISABLED)
                                                nkeep += 1;
                                        else
                                                tupgone = true; /* we can 
delete the tuple */
                                        all_visible = false;
                                        break;

In the case the HOT logic triggers, we'll call
heap_prepare_freeze_tuple() even when the tuple is dead. Which then can
lead us to
                if (TransactionIdPrecedes(xid, cutoff_xid))
                {
                        /*
                         * If we freeze xmax, make absolutely sure that it's 
not an XID
                         * that is important.  (Note, a lock-only xmax can be 
removed
                         * independent of committedness, since a committed lock 
holder has
                         * released the lock).
                         */
                        if (!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask) &&
                                TransactionIdDidCommit(xid))
                                ereport(PANIC,
                                                
(errcode(ERRCODE_DATA_CORRUPTED),
                                                 errmsg_internal("cannot freeze 
committed xmax %u",
                                                                                
 xid)));
                        freeze_xmax = true;

(before those errors we'd just have unset xmax)

Now obviously the question is whether it's possible that
heap_page_prune() left alive anything that could be seen as DEAD for the
check in lazy_scan_heap(), and that additionally is older than the
cutoff passed to heap_prepare_freeze_tuple().

I'm not sure - it seems like it could be possible in some corner cases,
when transactions abort after the heap_page_prune() but before the
second HeapTupleSatisfiesVacuum().

But regardless of whether it's possible today, it seems extremely
fragile. ISTM we should at least have a bunch of additional error checks
in the HOT branch for HEAPTUPLE_DEAD.

Greetings,

Andres Freund


Reply via email to