Thanks for looking this! At Thu, 23 Mar 2023 14:15:12 +0530, Dilip Kumar <dilipbal...@gmail.com> wrote in > On Thu, Mar 23, 2023 at 10:53 AM Kyotaro Horiguchi > <horikyota....@gmail.com> wrote: > Thanks for working on this, your idea looks fine but my only worry is > that in the future if someone tries to change the logic of > XidInMVCCSnapshot() then they must be aware that the snap->xip array > and snap->subxip array no long distinguishes the top xids and subxids.
Yeah, I had the same thought when I was working on the posted version. > I agree with the current logic if we are not marking sub-overflow then > there is no issue, so can we document this in the SnapshotData > structure? (I found that it was alrady mentioned...) In a unpublished version (what I referred to as "a mess"), I added a flag called "topsub_mixed" to SnapshotData, indicating that XIDs of top and sub transactions are stored in xip and subxip arrays in a mixed manner. However, I eventually removed it since it could only be used for sanity checks related to suboverflowed. I inserted the following sentense in the middle of the comments for xip and subxip. > In the case of !suboverflowed, there's a situation where this > contains both top and sub-transaction IDs in a mixed manner. And added similar a similar sentense to a comment of XidInMVCCSnapshot. While doning this, I realized that we could simplify and optimize XID search code by combining the two XID arrays. If !suboverflowed, the array stored all active XIDs of both top and sub-transactions. Otherwise it only stores active top XIDs and maybe XIDs of some sub-transactions. If many subXIDs are stored when overflowed, there might lead to some degradation but I think the win we gain from searching just one XID array in most cases outweighs that. (I didn't do this (of course) in this version.) > Also, there are some typos in the patch > /idetify/identify > /carete/create > /Aallocate/Allocate Oops! Thanks for pointing out them. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
>From 9536d174b05f0d112dde71a4a4ca61ae08f4ae9a Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota....@gmail.com> Date: Tue, 19 Jul 2022 11:50:29 +0900 Subject: [PATCH v8] Lift initial snapshot limit for logical replication The replication command CREATE_REPLICATION_SLOT tends to fail with "snapshot too large" when many subtransactions are active. This problem stems from SnapBuildInitialSnapshot attempting to generate a snapshot in which all active XIDs, including subxids, are stored within the xip array. This patch mitigates the constraint on the acceptable number of transaction IDs by utilizing the subxid array. Author: Dilip Kumar and Kyotaro Horiguchi --- src/backend/replication/logical/snapbuild.c | 50 +++++++++++++++++---- src/backend/storage/ipc/procarray.c | 18 ++------ src/backend/utils/time/snapmgr.c | 8 ++-- src/include/access/transam.h | 33 ++++++++++++++ src/include/utils/snapshot.h | 12 +++-- 5 files changed, 90 insertions(+), 31 deletions(-) diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c index 62542827e4..3a15c3cea7 100644 --- a/src/backend/replication/logical/snapbuild.c +++ b/src/backend/replication/logical/snapbuild.c @@ -565,10 +565,14 @@ Snapshot SnapBuildInitialSnapshot(SnapBuild *builder) { Snapshot snap; + int ntxn; + int xiplen; TransactionId xid; TransactionId safeXid; TransactionId *newxip; + TransactionId *newsubxip; int newxcnt = 0; + int newsubxcnt = 0; Assert(XactIsoLevel == XACT_REPEATABLE_READ); Assert(builder->building_full_snapshot); @@ -610,9 +614,35 @@ SnapBuildInitialSnapshot(SnapBuild *builder) MyProc->xmin = snap->xmin; - /* allocate in transaction context */ - newxip = (TransactionId *) - palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount()); + /* + * Allocate in transaction context + * + * Since we know all the active XIDs, this snapshot won't be + * suboverflowed. However, if the number of XIDs surpasses the XID arrays' + * capacity, we cannot establish this snapshot because we don't know which + * XIDs are top-level. Since the original snapshot is historical, we can't + * reliably identify the top-level XIDs. Thus, we have no choice but fail + * when there are too many active XIDs. + */ + ntxn = TransactionIdDistance(snap->xmin, snap->xmax) - snap->xcnt; + xiplen = GetMaxSnapshotXidCount(); + newxip = (TransactionId *) palloc(sizeof(TransactionId) * xiplen); + + if (ntxn > GetMaxSnapshotXidCount()) + { + /* + * Since we can't identify the top-level XIDs, we can't create a + * suboverflowed snapshot. + */ + if (ntxn > xiplen + GetMaxSnapshotSubxidCount()) + ereport(ERROR, + (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE), + errmsg("initial slot snapshot too large"))); + + newsubxip = (TransactionId *) + palloc(sizeof(TransactionId) * GetMaxSnapshotSubxidCount()); + } + /* * snapbuild.c builds transactions in an "inverted" manner, which means it @@ -633,21 +663,23 @@ SnapBuildInitialSnapshot(SnapBuild *builder) if (test == NULL) { - if (newxcnt >= GetMaxSnapshotXidCount()) - ereport(ERROR, - (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE), - errmsg("initial slot snapshot too large"))); - - newxip[newxcnt++] = xid; + if (newxcnt < xiplen) + newxip[newxcnt++] = xid; + else + newsubxip[newsubxcnt++] = xid; } TransactionIdAdvance(xid); } + Assert(newxcnt + newsubxcnt == ntxn); + /* adjust remaining snapshot fields as needed */ snap->snapshot_type = SNAPSHOT_MVCC; snap->xcnt = newxcnt; snap->xip = newxip; + snap->subxcnt = newsubxcnt; + snap->subxip = newsubxip; return snap; } diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c index ea91ce355f..a0f5ba5a36 100644 --- a/src/backend/storage/ipc/procarray.c +++ b/src/backend/storage/ipc/procarray.c @@ -4823,22 +4823,10 @@ KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid, Assert(TransactionIdPrecedesOrEquals(from_xid, to_xid)); /* - * Calculate how many array slots we'll need. Normally this is cheap; in - * the unusual case where the XIDs cross the wrap point, we do it the hard - * way. + * Calculate how many array slots we'll need. Both ends of the region are + * inclusive. */ - if (to_xid >= from_xid) - nxids = to_xid - from_xid + 1; - else - { - nxids = 1; - next_xid = from_xid; - while (TransactionIdPrecedes(next_xid, to_xid)) - { - nxids++; - TransactionIdAdvance(next_xid); - } - } + nxids = TransactionIdDistance(from_xid, to_xid) + 1; /* * Since only the startup process modifies the head/tail pointers, we diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c index 7d11ae3478..f7951d8787 100644 --- a/src/backend/utils/time/snapmgr.c +++ b/src/backend/utils/time/snapmgr.c @@ -2310,9 +2310,11 @@ XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot) /* * If the snapshot contains full subxact data, the fastest way to * check things is just to compare the given XID against both subxact - * XIDs and top-level XIDs. If the snapshot overflowed, we have to - * use pg_subtrans to convert a subxact XID to its parent XID, but - * then we need only look at top-level XIDs not subxacts. + * XIDs and top-level XIDs. Note that there's a case where both arrays + * contain top and sub-transaction's XIDs in a mixed manner. If the + * snapshot overflowed, we have to use pg_subtrans to convert a subxact + * XID to its parent XID, but then we need only look at top-level XIDs + * not subxacts. */ if (!snapshot->suboverflowed) { diff --git a/src/include/access/transam.h b/src/include/access/transam.h index f5af6d3055..b6a3c8abee 100644 --- a/src/include/access/transam.h +++ b/src/include/access/transam.h @@ -370,6 +370,39 @@ FullTransactionIdNewer(FullTransactionId a, FullTransactionId b) return b; } + +/* return the distance between the two IDs */ +static inline int +TransactionIdDistance(TransactionId from_xid, TransactionId to_xid) +{ + int nxids; + TransactionId next_xid; + + StaticAssertDecl(sizeof(int) >= sizeof(TransactionId), + "TransactionId exceeds the width of int"); + Assert(TransactionIdPrecedesOrEquals(from_xid, to_xid)); + + /* + * Calculate the distance between the two XIDs. Normally this is cheap; in + * the unusual case where the XIDs cross the wrap point, we do it the hard + * way. + */ + if (to_xid >= from_xid) + nxids = to_xid - from_xid; + else + { + nxids = 0; + next_xid = from_xid; + while (TransactionIdPrecedes(next_xid, to_xid)) + { + nxids++; + TransactionIdAdvance(next_xid); + } + } + + return nxids; +} + #endif /* FRONTEND */ #endif /* TRANSAM_H */ diff --git a/src/include/utils/snapshot.h b/src/include/utils/snapshot.h index 583a667a40..0de6aed6e0 100644 --- a/src/include/utils/snapshot.h +++ b/src/include/utils/snapshot.h @@ -160,8 +160,10 @@ typedef struct SnapshotData /* * For normal MVCC snapshot this contains the all xact IDs that are in * progress, unless the snapshot was taken during recovery in which case - * it's empty. For historic MVCC snapshots, the meaning is inverted, i.e. - * it contains *committed* transactions between xmin and xmax. + * it's empty. In the case of !suboverflowed, there's a situation where + * this contains both top and sub-transaction IDs in a mixed manner. For + * historic MVCC snapshots, the meaning is inverted, i.e. it contains + * *committed* transactions between xmin and xmax. * * note: all ids in xip[] satisfy xmin <= xip[i] < xmax */ @@ -171,8 +173,10 @@ typedef struct SnapshotData /* * For non-historic MVCC snapshots, this contains subxact IDs that are in * progress (and other transactions that are in progress if taken during - * recovery). For historic snapshot it contains *all* xids assigned to the - * replayed transaction, including the toplevel xid. + * recovery). In the case of !suboverflowed, there's a situation where this + * contains both top and sub-transaction IDs in a mixed manner. For + * historic snapshot it contains *all* xids assigned to the replayed + * transaction, including the toplevel xid. * * note: all ids in subxip[] are >= xmin, but we don't bother filtering * out any that are >= xmax -- 2.31.1