On Thu, May 30, 2013 at 2:39 PM, Robert Haas <robertmh...@gmail.com> wrote: > Random thought: Could you compute the reference XID based on the page > LSN? That would eliminate the storage overhead.
After mulling this over a bit, I think this is definitely possible. We begin a new "half-epoch" every 2 billion transactions. We remember the LSN at which the current half-epoch began and the LSN at which the previous half-epoch began. When a new half-epoch begins, the first backend that wants to stamp a tuple with an XID from the new half-epoch must first emit a "new half-epoch" WAL record, which becomes the starting LSN for the new half-epoch. We define a new page-level bit, something like PD_RECENTLY_FROZEN. When this bit is set, it means there are no unfrozen tuples on the page with XIDs that predate the current half-epoch. Whenever we know this to be true, we set the bit. If the page LSN crosses more than one half-epoch boundary at a time, we freeze the page and set the bit. If the page LSN crosses exactly one half-epoch boundary, then (1) if the bit is set, we clear it and (2) if the bit is not set, we freeze the page and set the bit. The advantage of this is that we avoid an epidemic of freezing right after a half-epoch change. Immediately after a half-epoch change, many pages will mix tuples from the current and previous half-epoch - but relatively few pages will have tuples from the current half-epoch and a half-epoch more than one in the past. As things stand today, we really only need to remember the last two half-epoch boundaries; they could be stored, for example, in the control file. But if we someday generalize CLOG to allow indefinite retention as you suggest, we could instead remember all half-epoch boundaries that have ever occurred; just maintain a file someplace with 8 bytes of data for every 2 billion XIDs consumed over the lifetime of the cluster. In fact, we might want to do it that way anyhow, just to keep our options open, and perhaps for forensics. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers