On Mon, Dec 30, 2024 at 10:08:31AM +0900, Michael Paquier wrote: > On Fri, Dec 27, 2024 at 06:16:24PM +0300, Vitaly Davydov wrote: > > As an idea, I would like to propose to store FullTransactionId in > > global transaction state instead of TransactionId. I'm not sure, it > > will consume significant additional memory, but it make the code > > more clear and probably result into less number of locks. > > FWIW, I was wondering about doing the same thing. However, I have > concluded that this some refactoring work out of the scope of fixing > the primary issue we have here, as we are hit by the way the file > names are built when we attempt to remove them. > > Note that the memory footprint of storing a FullTransactionId in > twophase.c's GlobalTransactionData does not worry me much. It is more > important to not increase the size of the two-phase state data, > impacting files on disk and their WAL records. This size argument > has been mentioned on the thread that has added epochs to the 2PC file > names, as far as I recall.
I agree with accepting +4 bytes in GlobalTransactionData. > At the end, I have applied two patches, the first one down to 13 that > took care of the "future" issue, with tests added in the v14~v16 > range. v13 was lacking a perl routine, and it's not a big deal to not > have coverage for this code path anyway. The second patch has been > applied down to v17, to fix the epoch issue, with the more advanced > tests. > > If you have a suggestion of patch to plug in a FullTransactionId into > GlobalTransactionData rather than a TransactionId, feel free to > propose something! Help is always welcome, and this would be > HEAD-only work, making it less urgent to deal with. I suspect we should do that and back-patch to v17 before the next releases, because commit 7e125b2 wrote this function: /* * Compute FullTransactionId for the given TransactionId, using the current * epoch. */ static inline FullTransactionId FullTransactionIdFromCurrentEpoch(TransactionId xid) { FullTransactionId fxid; FullTransactionId nextFullXid; uint32 epoch; nextFullXid = ReadNextFullTransactionId(); epoch = EpochFromFullTransactionId(nextFullXid); fxid = FullTransactionIdFromEpochAndXid(epoch, xid); return fxid; } I think "using the current epoch" is wrong for half of the nextFullXid values having epoch > 0. For example, nextFullId==2^32 is in epoch 1, but all the allowable XIDs are in epoch 0. (I mean "allowable" in the sense of AssertTransactionIdInAllowableRange().) From then until we assign another 2^31 XIDs, epochs 0 and 1 are both expected in XID values. After 2^31 XID assignments, every allowable XID be in epoch 1. Hence, twophase.c would need two-epoch logic like we have in widen_snapshot_xid() and XLogRecGetFullXid(). Is that right? (I wrote this in a hurry, so this email may have more than the standard level of errors.) Before commit 7e125b2, twophase also had that logic. I didn't work out the user-visible consequences of that logic's new absence here, but I bet on twophase recovery breakage. Similar problem here (up to two epochs are acceptable, not just one): + /* Discard files from past epochs */ + if (EpochFromFullTransactionId(fxid) < EpochFromFullTransactionId(nextXid)) I wrote the attached half-baked patch to fix those, but I tend to think it's better to use FullTransactionId as many places as possible in twophase.c. (We'd still need to convert XIDs that we read from xl_xact_prepare records, along the lines of XLogRecGetFullXid().) How do you see it?
Author: Noah Misch <n...@leadboat.com> Commit: Noah Misch <n...@leadboat.com> Fix twophase.c XID epoch tracking Half the time after epoch 0, allowable XIDs span two epochs. This would have no user-visible consequences during epoch 0, but I expect (unconfirmed) twophase breakage during other epochs. FIXME likely rework this in favor of broader fulltransaction use in twophase.c diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index a3190dc..7a16293 100644 --- a/src/backend/access/transam/twophase.c +++ b/src/backend/access/transam/twophase.c @@ -926,24 +926,6 @@ TwoPhaseGetDummyProc(TransactionId xid, bool lock_held) /* State file support */ /************************************************************************/ -/* - * Compute FullTransactionId for the given TransactionId, using the current - * epoch. - */ -static inline FullTransactionId -FullTransactionIdFromCurrentEpoch(TransactionId xid) -{ - FullTransactionId fxid; - FullTransactionId nextFullXid; - uint32 epoch; - - nextFullXid = ReadNextFullTransactionId(); - epoch = EpochFromFullTransactionId(nextFullXid); - - fxid = FullTransactionIdFromEpochAndXid(epoch, xid); - return fxid; -} - static inline int TwoPhaseFilePath(char *path, FullTransactionId fxid) { @@ -1283,7 +1265,7 @@ RegisterTwoPhaseRecord(TwoPhaseRmgrId rmid, uint16 info, * contents of the file, issuing an error when finding corrupted data. If * missing_ok is true, which indicates that missing files can be safely * ignored, then return NULL. This state can be reached when doing recovery - * after discarding two-phase files from other epochs. + * after discarding two-phase files from frozen epochs. */ static char * ReadTwoPhaseFile(TransactionId xid, bool missing_ok) @@ -1299,7 +1281,7 @@ ReadTwoPhaseFile(TransactionId xid, bool missing_ok) int r; FullTransactionId fxid; - fxid = FullTransactionIdFromCurrentEpoch(xid); + fxid = FullTransactionIdFromAllowableAt(ReadNextFullTransactionId(), xid); TwoPhaseFilePath(path, fxid); fd = OpenTransientFile(path, O_RDONLY | PG_BINARY); @@ -1664,15 +1646,13 @@ FinishPreparedTransaction(const char *gid, bool isCommit) /* Count the prepared xact as committed or aborted */ AtEOXact_PgStat(isCommit, false); - /* - * And now we can clean up any files we may have left. These should be - * from the current epoch. - */ + /* And now we can clean up any files we may have left. */ if (ondisk) { FullTransactionId fxid; - fxid = FullTransactionIdFromCurrentEpoch(xid); + fxid = FullTransactionIdFromAllowableAt(ReadNextFullTransactionId(), + xid); RemoveTwoPhaseFile(fxid, true); } @@ -1749,8 +1729,7 @@ RecreateTwoPhaseFile(TransactionId xid, void *content, int len) COMP_CRC32C(statefile_crc, content, len); FIN_CRC32C(statefile_crc); - /* Use current epoch */ - fxid = FullTransactionIdFromCurrentEpoch(xid); + fxid = FullTransactionIdFromAllowableAt(ReadNextFullTransactionId(), xid); TwoPhaseFilePath(path, fxid); fd = OpenTransientFile(path, @@ -1900,7 +1879,7 @@ CheckPointTwoPhase(XLogRecPtr redo_horizon) * This is called once at the beginning of recovery, saving any extra * lookups in the future. Two-phase files that are newer than the * minimum XID horizon are discarded on the way. Two-phase files with - * an epoch older or newer than the current checkpoint's record epoch + * an epoch frozen relative to the current checkpoint's record epoch * are also discarded. */ void @@ -1971,7 +1950,6 @@ PrescanPreparedTransactions(TransactionId **xids_p, int *nxids_p) TransactionId origNextXid = XidFromFullTransactionId(nextXid); TransactionId result = origNextXid; TransactionId *xids = NULL; - uint32 epoch = EpochFromFullTransactionId(nextXid); int nxids = 0; int allocsize = 0; int i; @@ -1988,11 +1966,7 @@ PrescanPreparedTransactions(TransactionId **xids_p, int *nxids_p) xid = gxact->xid; - /* - * All two-phase files with past and future epoch in pg_twophase are - * gone at this point, so we're OK to rely on only the current epoch. - */ - fxid = FullTransactionIdFromEpochAndXid(epoch, xid); + fxid = FullTransactionIdFromAllowableAt(nextXid, xid); buf = ProcessTwoPhaseBuffer(fxid, gxact->prepare_start_lsn, gxact->ondisk, false, true); @@ -2055,12 +2029,9 @@ void StandbyRecoverPreparedTransactions(void) { int i; - uint32 epoch; FullTransactionId nextFullXid; - /* get current epoch */ nextFullXid = ReadNextFullTransactionId(); - epoch = EpochFromFullTransactionId(nextFullXid); LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE); for (i = 0; i < TwoPhaseState->numPrepXacts; i++) @@ -2074,11 +2045,7 @@ StandbyRecoverPreparedTransactions(void) xid = gxact->xid; - /* - * At this stage, we're OK to work with the current epoch as all past - * and future files have been already discarded. - */ - fxid = FullTransactionIdFromEpochAndXid(epoch, xid); + fxid = FullTransactionIdFromAllowableAt(nextFullXid, xid); buf = ProcessTwoPhaseBuffer(fxid, gxact->prepare_start_lsn, gxact->ondisk, true, false); @@ -2108,12 +2075,9 @@ void RecoverPreparedTransactions(void) { int i; - uint32 epoch; FullTransactionId nextFullXid; - /* get current epoch */ nextFullXid = ReadNextFullTransactionId(); - epoch = EpochFromFullTransactionId(nextFullXid); LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE); for (i = 0; i < TwoPhaseState->numPrepXacts; i++) @@ -2127,10 +2091,6 @@ RecoverPreparedTransactions(void) TransactionId *subxids; const char *gid; - /* - * At this stage, we're OK to work with the current epoch as all past - * and future files have been already discarded. - */ xid = gxact->xid; /* @@ -2142,7 +2102,7 @@ RecoverPreparedTransactions(void) * SubTransSetParent has been set before, if the prepared transaction * generated xid assignment records. */ - fxid = FullTransactionIdFromEpochAndXid(epoch, xid); + fxid = FullTransactionIdFromAllowableAt(nextFullXid, xid); buf = ProcessTwoPhaseBuffer(fxid, gxact->prepare_start_lsn, gxact->ondisk, true, false); @@ -2260,8 +2220,9 @@ ProcessTwoPhaseBuffer(FullTransactionId fxid, return NULL; } - /* Discard files from past epochs */ - if (EpochFromFullTransactionId(fxid) < EpochFromFullTransactionId(nextXid)) + /* Discard files from frozen epochs */ + if (EpochFromFullTransactionId(fxid) + 1 < + EpochFromFullTransactionId(nextXid)) { if (fromdisk) { @@ -2576,8 +2537,8 @@ PrepareRedoAdd(char *buf, XLogRecPtr start_lsn, char path[MAXPGPATH]; FullTransactionId fxid; - /* Use current epoch */ - fxid = FullTransactionIdFromCurrentEpoch(hdr->xid); + fxid = FullTransactionIdFromAllowableAt(ReadNextFullTransactionId(), + hdr->xid); TwoPhaseFilePath(path, fxid); if (access(path, F_OK) == 0) @@ -2676,10 +2637,8 @@ PrepareRedoRemove(TransactionId xid, bool giveWarning) { FullTransactionId fxid; - /* - * We should deal with a file at the current epoch here. - */ - fxid = FullTransactionIdFromCurrentEpoch(xid); + fxid = FullTransactionIdFromAllowableAt(ReadNextFullTransactionId(), + xid); RemoveTwoPhaseFile(fxid, giveWarning); } RemoveGXact(gxact); diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c index 3596af0..91b6a91 100644 --- a/src/backend/access/transam/xlogreader.c +++ b/src/backend/access/transam/xlogreader.c @@ -2166,28 +2166,14 @@ RestoreBlockImage(XLogReaderState *record, uint8 block_id, char *page) FullTransactionId XLogRecGetFullXid(XLogReaderState *record) { - TransactionId xid, - next_xid; - uint32 epoch; - /* * This function is only safe during replay, because it depends on the * replay state. See AdvanceNextFullTransactionIdPastXid() for more. */ Assert(AmStartupProcess() || !IsUnderPostmaster); - xid = XLogRecGetXid(record); - next_xid = XidFromFullTransactionId(TransamVariables->nextXid); - epoch = EpochFromFullTransactionId(TransamVariables->nextXid); - - /* - * If xid is numerically greater than next_xid, it has to be from the last - * epoch. - */ - if (unlikely(xid > next_xid)) - --epoch; - - return FullTransactionIdFromEpochAndXid(epoch, xid); + return FullTransactionIdFromAllowableAt(TransamVariables->nextXid, + XLogRecGetXid(record)); } #endif diff --git a/src/backend/utils/adt/xid8funcs.c b/src/backend/utils/adt/xid8funcs.c index 4736755..b173956 100644 --- a/src/backend/utils/adt/xid8funcs.c +++ b/src/backend/utils/adt/xid8funcs.c @@ -155,35 +155,6 @@ TransactionIdInRecentPast(FullTransactionId fxid, TransactionId *extracted_xid) } /* - * Convert a TransactionId obtained from a snapshot held by the caller to a - * FullTransactionId. Use next_fxid as a reference FullTransactionId, so that - * we can compute the high order bits. It must have been obtained by the - * caller with ReadNextFullTransactionId() after the snapshot was created. - */ -static FullTransactionId -widen_snapshot_xid(TransactionId xid, FullTransactionId next_fxid) -{ - TransactionId next_xid = XidFromFullTransactionId(next_fxid); - uint32 epoch = EpochFromFullTransactionId(next_fxid); - - /* Special transaction ID. */ - if (!TransactionIdIsNormal(xid)) - return FullTransactionIdFromEpochAndXid(0, xid); - - /* - * The 64 bit result must be <= next_fxid, since next_fxid hadn't been - * issued yet when the snapshot was created. Every TransactionId in the - * snapshot must therefore be from the same epoch as next_fxid, or the - * epoch before. We know this because next_fxid is never allow to get - * more than one epoch ahead of the TransactionIds in any snapshot. - */ - if (xid > next_xid) - epoch--; - - return FullTransactionIdFromEpochAndXid(epoch, xid); -} - -/* * txid comparator for qsort/bsearch */ static int @@ -420,12 +391,18 @@ pg_current_snapshot(PG_FUNCTION_ARGS) nxip = cur->xcnt; snap = palloc(PG_SNAPSHOT_SIZE(nxip)); - /* fill */ - snap->xmin = widen_snapshot_xid(cur->xmin, next_fxid); - snap->xmax = widen_snapshot_xid(cur->xmax, next_fxid); + /* + * Fill. This is the current backend's active snapshot, so MyProc->xmin + * is <= all these XIDs. As long as that remains so, oldestXid can't + * advance past any of these XIDs. Hence, these XIDs remain allowable + * relative to next_fxid. + */ + snap->xmin = FullTransactionIdFromAllowableAt(next_fxid, cur->xmin); + snap->xmax = FullTransactionIdFromAllowableAt(next_fxid, cur->xmax); snap->nxip = nxip; for (i = 0; i < nxip; i++) - snap->xip[i] = widen_snapshot_xid(cur->xip[i], next_fxid); + snap->xip[i] = + FullTransactionIdFromAllowableAt(next_fxid, cur->xip[i]); /* * We want them guaranteed to be in ascending order. This also removes diff --git a/src/include/access/transam.h b/src/include/access/transam.h index 0cab865..48fce16 100644 --- a/src/include/access/transam.h +++ b/src/include/access/transam.h @@ -77,6 +77,35 @@ FullTransactionIdFromEpochAndXid(uint32 epoch, TransactionId xid) return result; } +/* + * Compute FullTransactionId for the given TransactionId, assuming xid was + * between [oldestXid, nextXid] at the time when TransamVariables->nextXid was + * nextFullXid. + */ +static inline FullTransactionId +FullTransactionIdFromAllowableAt(FullTransactionId nextFullXid, + TransactionId xid) +{ + uint32 epoch; + + /* Special transaction ID. */ + if (!TransactionIdIsNormal(xid)) + return FullTransactionIdFromEpochAndXid(0, xid); + + /* + * The 64 bit result must be <= nextFullXid, since nextFullXid hadn't been + * issued yet when xid was in the past. The xid must therefore be from + * the epoch of nextFullXid or the epoch before. We know this because we + * must remove (by freezing) an XID before assigning the XID half an epoch + * ahead of it. + */ + epoch = EpochFromFullTransactionId(nextFullXid); + if (xid > XidFromFullTransactionId(nextFullXid)) + epoch--; + + return FullTransactionIdFromEpochAndXid(epoch, xid); +} + static inline FullTransactionId FullTransactionIdFromU64(uint64 value) {