Hi, On 2020-07-24 11:06:58 -0400, Robert Haas wrote: > On Thu, Jul 23, 2020 at 2:10 PM Andres Freund <and...@anarazel.de> wrote: > > In the case the HOT logic triggers, we'll call > > heap_prepare_freeze_tuple() even when the tuple is dead. > > I think this is very bad. I've always been confused about these > comments, but I couldn't quite put my finger on the problem. Now I > think I can: the comments here imagine that we have the option either > to set tupgone, causing the line pointer to be marked unused by an > eventual call to lazy_vacuum_page(), or that we can decline to set > tupgone, which will leave the tuple around to be handled by the next > vacuum.
Yea. I think the only saving grace is that it's not obvious when the situation can arise without prior corruption. But even if that's actuall impossible, it seems extremely fragile. I stared at heap_prune_chain() for quite a while and couldn't convince myself either way. > However, we don't really have a choice at all. A choice implies that > either option is correct, and therefore we can elect the one we > prefer. But here, it's not just that one option is incorrect, but that > both options are incorrect. Setting tupgone controls whether or not > the tuple is considered for freezing. If we DON'T consider freezing > it, then we might manage to advance relfrozenxid while an older XID > still exists in the table. If we DO consider freezing it, we will > correctly conclude that it needs to be frozen, but then the freezing > code is in an impossible situation, because it has no provision for > getting rid of tuples, only for keeping them. Its logic assumes that > whenever we are freezing xmin or xmax we do that in a way that causes > the tuple to be visible to everyone, but this tuple should be visible > to no one. So with your changes it now errors out instead of > corrupting data, but that's just replacing one bad thing (data > corruption) with another (VACUUM failures). I suspect that the legitimate cases hitting this branch won't error out, because then xmin/xmax aren't old enough to need to be frozen. > I think the actual correct behavior here is to mark the line pointer > as dead. That's not trivial, because just doing that naively will break HOT. > The easiest way to accomplish that is probably to have > lazy_scan_heap() just emit an extra XLOG_HEAP2_CLEAN record beyond > whatever HOT-pruning already did, if it's necessary. A better solution > would probably be to merge HOT-pruning with setting things all-visible > and have a single function that does both, but that seems a lot more > invasive, and definitely unsuitable for back-patching. I suspect that merging pruning and this logic in lazy_scan_heap() really is the only proper way to solve this kind of issue. I wonder if, given we don't know if this issue can be hit in a real database, and given that it already triggers an error, the right way to deal with this in the back-branches is to emit a more precise error message. I.e. if we hit this branch, and either xmin/xmax are older than the cutoff, then we issue a more specific ERROR. Greetings, Andres Freund