Commit d70cf811, from 2014, promoted an Assert() within IndexBuildHeapScan() to a "can't happen" elog() error, in order to detect when a parent tuple cannot be found for some heap-only tuple -- if this happens, then it indicates corruption. I think that we should make it a full ereport(), with an errcode of ERRCODE_DATA_CORRUPTION, to match what Andres just added to code that deals with freezing (he promoted Assert()s to errors, just like the 2014 commit, though he went as far as making them ereport()s to begin with). Attached patch does this.
I propose a backpatch to 9.3, partially for the sake of tools like amcheck, where users may only be on the lookout for ERRCODE_DATA_CORRUPTION and ERRCODE_INDEX_CORRUPTED. FWIW, an old MultiXact/recovery bug, alluded to by the commit message of d70cf811 [1] (and fixed by 6bfa88acd) was the cause of some déjà vu for me while looking into the "freeze the dead" issues. Because the enhanced amcheck [2] actually raised this error when I went to verify the first "freeze the dead" bugfix [3], it's clearly effective as a test for certain types of corruption. If CREATE INDEX/IndexBuildHeapScan() didn't already perform this check, then it would probably be necessary for amcheck to implement it on its own. What heap_get_root_tuples() does for us here is ideally suited to finding inconsistencies in HOT chains, because it matches xmin against xmax, looks at line pointer bits/redirects, and consults pg_multixact if necessary. The only thing that it *doesn't* do is make sure that hint bits accurately reflect what it says in the CLOG -- we'll need to find another way to do that, by directly targeting heap relations with their own function. In short, it does an awful lot for tools like amcheck, and I want to make sure that we get the full benefit of that. [1] https://www.postgresql.org/message-id/CAM3SWZTMQiCi5PV5OWHb+bYkUcnCk=o67w0csswpvv7xfuc...@mail.gmail.com [2] https://github.com/petergeoghegan/amcheck#optional-heapallindexed-verification [3] https://postsgr.es/m/cah2-wznm4rcrhfaiwkpwtpew2bxdtgrozk7jwwgucxeh3d1...@mail.gmail.com -- Peter Geoghegan
From 0a37ceea53736e39d0f1af72ec9bc07d98833370 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan <p...@bowt.ie> Date: Fri, 15 Dec 2017 14:05:16 -0800 Subject: [PATCH] Promote "HOT parent tuple" elog to an ereport. Commit d70cf811, from 2014, promoted an assert within IndexBuildHeapScan() (and another similar assert) to a "can't happen" elog() error, in order to detect when a parent tuple cannot be found for some heap-only tuple during CREATE INDEX/REINDEX. When the error is raised, it indicates heap corruption. This has proven useful as a corruption smoke-test while investigating recent corruption-inducing bugs in pruning/freezing. This commit promotes those elog() errors to ereport() errors. The errors are emitted with ereport rather than elog, despite being "should never happen" messages, so a proper error code is emitted. To avoid superfluous translations, mark messages as internal. This is done primarily for the benefit of tools like amcheck, that may want to piggy-back on IndexBuildHeapScan() to perform integrity checking. It's worth spelling out the nature of the problem for users of these tools. Author: Peter Geoghegan Reviewed-By: Andres Freund Backpatch: 9.3- --- src/backend/catalog/index.c | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index 0125c18..e43482d 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -2595,9 +2595,12 @@ IndexBuildHeapRangeScan(Relation heapRelation, offnum = ItemPointerGetOffsetNumber(&heapTuple->t_self); if (!OffsetNumberIsValid(root_offsets[offnum - 1])) - elog(ERROR, "failed to find parent tuple for heap-only tuple at (%u,%u) in table \"%s\"", - ItemPointerGetBlockNumber(&heapTuple->t_self), - offnum, RelationGetRelationName(heapRelation)); + ereport(ERROR, + (errcode(ERRCODE_DATA_CORRUPTED), + errmsg_internal("failed to find parent tuple for heap-only tuple at (%u,%u) in table \"%s\"", + ItemPointerGetBlockNumber(&heapTuple->t_self), + offnum, + RelationGetRelationName(heapRelation)))); ItemPointerSetOffsetNumber(&rootTuple.t_self, root_offsets[offnum - 1]); @@ -3060,10 +3063,12 @@ validate_index_heapscan(Relation heapRelation, { root_offnum = root_offsets[root_offnum - 1]; if (!OffsetNumberIsValid(root_offnum)) - elog(ERROR, "failed to find parent tuple for heap-only tuple at (%u,%u) in table \"%s\"", - ItemPointerGetBlockNumber(heapcursor), - ItemPointerGetOffsetNumber(heapcursor), - RelationGetRelationName(heapRelation)); + ereport(ERROR, + (errcode(ERRCODE_DATA_CORRUPTED), + errmsg_internal("failed to find parent tuple for heap-only tuple at (%u,%u) in table \"%s\"", + ItemPointerGetBlockNumber(heapcursor), + ItemPointerGetOffsetNumber(heapcursor), + RelationGetRelationName(heapRelation)))); ItemPointerSetOffsetNumber(&rootTuple, root_offnum); } -- 2.7.4