On Wed, Mar 22, 2023 at 1:45 PM Andres Freund <and...@anarazel.de> wrote: > At the very least there's missing verification that tids actually exists in > the > "Update chain validation" loop, leading to: > TRAP: failed Assert("ItemIdHasStorage(itemId)"), File: > "../../../../home/andres/src/postgresql/src/include/storage/bufpage.h", Line: > 355, PID: 645093 > > Which makes sense - the earlier loop adds t_ctid to the successor array, which > we then query without checking if there still is such a tid on the page. > > I suspect we don't just need a !ItemIdIsUsed(), but also a check gainst the > max offset on the page.
We definitely need to do it that way, since a heap-only tuple's t_ctid is allowed to point to almost anything. I guess it can't point to some completely different heap block, but that's about the only restriction. In particular, it can point to an item that's past the end of the page following line pointer array truncation (truncation can happen during pruning or when the second heap pass takes place in VACUUM). OTOH the rules for LP_REDIRECT items *are* very strict. They need to be, since it's the root item of the HOT chain, referenced by TIDs in indexes, and have no heap tuple header metadata to use in cross-checks that take place during HOT chain traversal (traversal by code in places such as heap_hot_search_buffer). > WRT these failures: > non-heap-only update produced a heap-only tuple at offset 20 > > I think that's largely a consequence of HeapTupleHeaderIsHotUpdated()'s > definition: That has to be a problem for verify_heapam. > Currently the new verify_heapam() follows ctid chains when XMAX_INVALID is set > and expects to find an item it can dereference - but I don't think that's > something we can rely on: Afaics HOT pruning can break chains, but doesn't > reset xmax. I think that we need two passes to be completely thorough. An initial pass, that works pretty much as-is, plus a second pass that locates any orphaned heap-only tuples -- heap-only tuples that were not deemed part of a valid HOT chain during the first pass. These remaining orphaned heap-only tuples should be verified as having come from now-aborted transactions (they should definitely be fully DEAD) -- otherwise we have corruption. That's what my abandoned patch to make heap pruning more robust did, you'll recall. -- Peter Geoghegan