Hi, In a development branch of mine Thomas / the CF bot found a relatively rare regression failures. That turned out to be because there was an edge case in which heap_page_prune() was a bit more pessimistic than lazy_scan_heap(). But I wonder if this isn't an issue more broadly:
The issue I am concerned about is lazy_scan_heap()'s logic for DEAD HOT updated tuples: /* * Ordinarily, DEAD tuples would have been removed by * heap_page_prune(), but it's possible that the tuple * state changed since heap_page_prune() looked. In * particular an INSERT_IN_PROGRESS tuple could have * changed to DEAD if the inserter aborted. So this * cannot be considered an error condition. * * If the tuple is HOT-updated then it must only be * removed by a prune operation; so we keep it just as if * it were RECENTLY_DEAD. Also, if it's a heap-only * tuple, we choose to keep it, because it'll be a lot * cheaper to get rid of it in the next pruning pass than * to treat it like an indexed tuple. Finally, if index * cleanup is disabled, the second heap pass will not * execute, and the tuple will not get removed, so we must * treat it like any other dead tuple that we choose to * keep. * * If this were to happen for a tuple that actually needed * to be deleted, we'd be in trouble, because it'd * possibly leave a tuple below the relation's xmin * horizon alive. heap_prepare_freeze_tuple() is prepared * to detect that case and abort the transaction, * preventing corruption. */ if (HeapTupleIsHotUpdated(&tuple) || HeapTupleIsHeapOnly(&tuple) || params->index_cleanup == VACOPT_TERNARY_DISABLED) nkeep += 1; else tupgone = true; /* we can delete the tuple */ all_visible = false; break; In the case the HOT logic triggers, we'll call heap_prepare_freeze_tuple() even when the tuple is dead. Which then can lead us to if (TransactionIdPrecedes(xid, cutoff_xid)) { /* * If we freeze xmax, make absolutely sure that it's not an XID * that is important. (Note, a lock-only xmax can be removed * independent of committedness, since a committed lock holder has * released the lock). */ if (!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask) && TransactionIdDidCommit(xid)) ereport(PANIC, (errcode(ERRCODE_DATA_CORRUPTED), errmsg_internal("cannot freeze committed xmax %u", xid))); freeze_xmax = true; (before those errors we'd just have unset xmax) Now obviously the question is whether it's possible that heap_page_prune() left alive anything that could be seen as DEAD for the check in lazy_scan_heap(), and that additionally is older than the cutoff passed to heap_prepare_freeze_tuple(). I'm not sure - it seems like it could be possible in some corner cases, when transactions abort after the heap_page_prune() but before the second HeapTupleSatisfiesVacuum(). But regardless of whether it's possible today, it seems extremely fragile. ISTM we should at least have a bunch of additional error checks in the HOT branch for HEAPTUPLE_DEAD. Greetings, Andres Freund