Since we're bashing around ideas around freezing, let me write down the idea I've been pondering and discussing with various people for years. I don't think I invented this myself, apologies to whoever did for not giving credit.

The reason we have to freeze is that otherwise our 32-bit XIDs wrap around and become ambiguous. The obvious solution is to extend XIDs to 64 bits, but that would waste a lot space. The trick is to add a field to the page header indicating the 'epoch' of the XID, while keeping the XIDs in tuple header 32-bit wide (*).

The other reason we freeze is to truncate the clog. But with 64-bit XIDs, we wouldn't actually need to change old XIDs on disk to FrozenXid. Instead, we could implicitly treat anything older than relfrozenxid as frozen.

That's the basic idea. Vacuum freeze only needs to remove dead tuples, but doesn't need to dirty pages that contain no dead tuples.

Since we're not storing 64-bit wide XIDs on every tuple, we'd still need to replace the XIDs with FrozenXid whenever the difference between the smallest and largest XID on a page exceeds 2^31. But that would only happen when you're updating the page, in which case the page is dirtied anyway, so it wouldn't cause any extra I/O.

This would also be the first step in allowing the clog to grow larger than 2 billion transactions, eliminating the need for anti-wraparound freezing altogether. You'd still want to truncate the clog eventually, but it would be nice to not be pressed against the wall with "run vacuum freeze now, or the system will shut down".

(*) "Adding an epoch" is inaccurate, but I like to use that as my mental model. If you just add a 32-bit epoch field, then you cannot have xids from different epochs on the page, which would be a problem. In reality, you would store one 64-bit XID value in the page header, and use that as the "reference point" for all the 32-bit XIDs on the tuples. See existing convert_txid() function for how that works. Another method is to store the 32-bit xid values in tuple headers as offsets from the per-page 64-bit value, but then you'd always need to have the 64-bit value at hand when interpreting the XIDs, even if they're all recent.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to