Robert Haas <robertmh...@gmail.com> writes: > I think we can improve this a bit further by also introducing a > HEAP_XMIN_FROZEN bit that we set in lieu of overwriting XMIN with > FrozenXID. This allows us to freeze tuples aggressively - if we want > - without losing any forensic information.
So far so good ... > We can then modify the > above algorithm slightly, so that when we observe that a page is all > visible, we not only set PD_ALL_VISIBLE on the page but also > HEAP_XMIN_FROZEN on each tuple. The WAL record marking the page as > all-visible then doubles as a WAL record marking it frozen, > eliminating the need to dirty the page yet again at anti-wraparound > vacuum time. but this seems a lot more dubious/fragile. The basic problem is that it's not clear whether HEAP_XMIN_FROZEN is a hint bit or essential data. If you want to set it without the overhead of an LSN bump or a possible FPI in WAL, then it's a hint bit. But if you're using it to protect clog truncation then it's essential data. Perhaps you can make this work but there are some nonobvious requirements: 1. Seeing PD_ALL_VISIBLE set does not excuse vacuum from having to iterate through all the tuples on the page checking for HEAP_XMIN_FROZEN. This is because the non-logged update of the page might have been torn on the way to disk, such that PD_ALL_VISIBLE got set but not all of the FROZEN bits did. 2. During an anti-wraparound vacuum, you *need to* emit a WAL record when setting HEAP_XMIN_FROZEN. It's not a hint, any more than writing FrozenXID is now. Actually, #2 isn't even good enough. What if vacuum passes over a page and finds all the FROZEN bits set, but the reason they're set is that somebody else updated them in hint fashion microseconds before? It seems possible that those bits might not make it to disk before a subsequent crash. The only way to be really sure those bits are set is to emit a WAL record that says to set them, whether or not they seem to be set already. While the WAL record could be small, you'd need one for every page, making the argument that this saves I/O somewhat dubious. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers