On Tue, Oct 3, 2017 at 9:48 AM, Alvaro Herrera <alvhe...@2ndquadrant.com> wrote: > But that still doesn't fix the problem; > as far as I can see, vacuum removes the root of the chain, not yet sure > why, and then things are just as corrupted as before.
Are you sure it's not opportunistic pruning? Another thing that I've noticed with this problem is that the relevant IndexTuple will pretty quickly vanish, presumably due to LP_DEAD setting (but maybe not actually due to LP_DEAD setting). (Studies the problem some more...) I now think that it actually is a VACUUM problem, specifically a problem with VACUUM pruning. You see the HOT xmin-to-xmax check pattern that you mentioned within heap_prune_chain(), which looks like where the incorrect tuple prune (or possibly, at times, redirect?) takes place. (I refer to the prune/kill that you mentioned today, that frustrated your first attempt at a fix -- "I modified the multixact freeze code...".) The attached patch "fixes" the problem -- I cannot get amcheck to complain about corruption with this applied. And, "make check-world" passes. Hopefully it goes without saying that this isn't actually my proposed fix. It tells us something that this at least *masks* the problem, though; it's a start. FYI, the repro case page contents looks like this with the patch applied: postgres=# select lp, lp_flags, t_xmin, t_xmax, t_ctid, to_hex(t_infomask) as infomask, to_hex(t_infomask2) as infomask2 from heap_page_items(get_raw_page('t', 0)); lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask | infomask2 ----+----------+---------+--------+--------+----------+----------- 1 | 1 | 1845995 | 0 | (0,1) | b02 | 3 2 | 2 | | | | | 3 | 0 | | | | | 4 | 0 | | | | | 5 | 0 | | | | | 6 | 0 | | | | | 7 | 1 | 1846001 | 0 | (0,7) | 2b02 | 8003 (7 rows) -- Peter Geoghegan
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c index 52231ac..90eb39e 100644 --- a/src/backend/access/heap/pruneheap.c +++ b/src/backend/access/heap/pruneheap.c @@ -470,13 +470,6 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum, ItemPointerSet(&(tup.t_self), BufferGetBlockNumber(buffer), offnum); /* - * Check the tuple XMIN against prior XMAX, if any - */ - if (TransactionIdIsValid(priorXmax) && - !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax)) - break; - - /* * OK, this tuple is indeed a member of the chain. */ chainitems[nchain++] = offnum;
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers