On Tue, Jun 25, 2024 at 8:03 AM Andres Freund <and...@anarazel.de> wrote: > I think that's going in the wrong direction. We *want* to prune more > aggressively if we can (*), the necessary state is represented by the > vistest. That's a different thing than *having* to prune tuples beyond a > certain xmin (the cutoff determined by vacuum.c/vacuumlazy.c). The problem > we're having here is that the two states can get out of sync due to the > vistest "moving backwards", because of hot_standby_feedback (and perhaps also > an issue around aborts).
I agree that we want to prune more aggressively if we can. I think that fixing this by preventing vistest from going backward is reasonable, and I like it better than what Melanie proposed, although I like what Melanie proposed much better than not fixing it! I'm not sure how to do that cleanly, but one of you may have an idea. I do think that having a bunch of different XID values that function as horizons and a vistest object that holds some more XID horizons floating around in vacuum makes the code hard to understand. The relationships between the various values are not well-documented. For instance, the vistest has to be after vacrel->cutoffs.OldestXmin for correctness, but I don't think there's a single comment anywhere saying that; meanwhile, the comments for VacuumCutoffs say "OldestXmin is the Xid below which tuples deleted by any xact (that committed) should be considered DEAD, not just RECENTLY_DEAD." Surely the reader can be forgiven for thinking that this is the cutoff that will actually be used by pruning, but it isn't. And more generally, it seems like a fairly big problem to me that LVRelState directly stores NewRelfrozenXid; contains a VacuumCutoffs object that stores relfrozenxid, OldestXmin, and FreezeLimit; and also points to a GlobalVisState object that contains definitely_needed and maybe_needed. That is six different XID cutoffs for one vacuum operation. That's a lot. I can't describe how they're all different from each other or what the necessary relationships between them are off-hand, and I bet nobody else could either, at least until recently, else we might not have this bug. I feel like if it were possible to have fewer of them and still have things work, we'd be better off. I'm not sure that's doable. But six seems like a lot. -- Robert Haas EDB: http://www.enterprisedb.com