On Mon, Mar 15, 2021 at 4:11 PM Andres Freund <and...@anarazel.de> wrote: > > > I'm not comfortable with this change without adding more safety > > > checks. If there's ever a case in which the HEAPTUPLE_DEAD case is hit > > > and the xid needs to be frozen, we'll either cause errors or > > > corruption. Yes, that's already the case with params->index_cleanup == > > > DISABLED, but that's not that widely used. > > > > I noticed that Noah's similar 2013 patch [1] added a defensive > > heap_tuple_needs_freeze() + elog(ERROR) to the HEAPTUPLE_DEAD case. I > > suppose that that's roughly what you have in mind here? > > I'm not sure that's sufficient. If the case is legitimately reachable > (I'm maybe 60% is not, after staring at it for a long time, but ...), > then we can't just error out when we didn't so far.
If you're only 60% sure that the heap_tuple_needs_freeze() error thing doesn't break anything that should work by now then it seems unlikely that you'll ever get past 90% sure. I think that we should make a conservative assumption that the defensive elog(ERROR) won't be sufficient, and proceed on that basis. > I kinda wonder whether this case should just be handled by just gotoing > back to the start of the blkno loop, and redoing the pruning. The only > thing that makes that a bit more complicatd is that we've already > incremented vacrelstats->{scanned_pages,vacrelstats->tupcount_pages}. That seems like a good solution to me -- this is a very seldom hit path, so we can be a bit inefficient without it mattering. It might make sense to *also* check some things (maybe using heap_tuple_needs_freeze()) in passing, just for debugging purposes. > We really should put the per-page work (i.e. the blkno loop body) of > lazy_scan_heap() into a separate function, same with the > too-many-dead-tuples branch. +1. BTW I've noticed that the code (and code like it) tends to confuse things that the current VACUUM performed versus things by *some* VACUUM (that may or may not be current one). This refactoring might be a good opportunity to think about that as well. > > * It is assumed that the caller has checked the tuple with > > * HeapTupleSatisfiesVacuum() and determined that it is not HEAPTUPLE_DEAD > > * (else we should be removing the tuple, not freezing it). > > > > Does that need work too? > > I'm pretty scared of the index-cleanup-disabled path, for that reason. I > think the hot path is more likely to be unproblematic, but I'd not bet > my (nonexistant) farm on it. Well if we can solve the problem by simply doing pruning once again in the event of a HEAPTUPLE_DEAD return value from the lazy_scan_heap() HTSV call, then the comment becomes 100% true (which is not the case even today). -- Peter Geoghegan