On Tue, Apr 22, 2025 at 12:36 PM Zhijie Hou (Fujitsu)
<houzj.f...@fujitsu.com> wrote:
>
> When analyzing some issues in another thread[1], I found a bug that fast 
> forward
> decoding could lead to premature advancement of catalog_xmin, resulting in
> required catalog data being removed by vacuum.
>
> The issue arises because we do not build a base snapshot when decoding changes
> during fast forward mode, preventing reference to the minimum transaction ID
> that remains visible in the snapshot when determining the candidate for
> catalog_xmin. As a result, catalog_xmin was directly advanced to the oldest
> running transaction ID found in the latest running_xacts record.
>
> In code-level, SnapBuildProcessChange -> ReorderBufferSetBaseSnapshot could 
> not
> be reached during fast forward decoding, resulting in
> rb->txns_by_base_snapshot_lsn being NULL. When advancing catalog_xmin, the
> system attempts to refer to rb->txns_by_base_snapshot_lsn via
> ReorderBufferGetOldestXmin(). However, since rb->txns_by_base_snapshot_lsn is
> NULL, it defaults to directly using running->oldestRunningXid as the candidate
> for catalog_xmin. For reference, see the details in
> SnapBuildProcessRunningXacts.
>
> See the attachment for a test(0002) to prove that the catalog data that are
> still required would be removed after premature catalog_xmin advancement 
> during
> fast forward decoding.
>
> To fix this, I think we can allow the base snapshot to be built during fast
> forward decoding, as implemented in the patch 0001 (We already built base
> snapshot in fast-forward mode for logical message in logicalmsg_decode()).
>

The same bug was fixed for non-fast_forward mode in commit f49a80c4.
See the following code in that commit:

-       LogicalIncreaseXminForSlot(lsn, running->oldestRunningXid);
+       xmin = ReorderBufferGetOldestXmin(builder->reorder);
+       if (xmin == InvalidTransactionId)
+               xmin = running->oldestRunningXid;
....
+       LogicalIncreaseXminForSlot(lsn, xmin);

Is my understanding correct? If so, then I think it is a miss in the
commit f49a80c4 to consider fast_forward mode. Please refer commit in
the commit message.

The code changes look correct to me. I have some suggestions related
to comments in the code. See attached. Please prepare patches for back
branches as well.

-- 
With Regards,
Amit Kapila.
diff --git a/src/backend/replication/logical/decode.c 
b/src/backend/replication/logical/decode.c
index 8dc467a8bb3..cc03f0706e9 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -413,10 +413,10 @@ heap2_decode(LogicalDecodingContext *ctx, 
XLogRecordBuffer *buf)
        /*
         * If we don't have snapshot or we are just fast-forwarding, there is no
         * point in decoding data changes. However, it's crucial to build the 
base
-        * snapshot during fast-forward mode (as done in 
SnapBuildProcessChange())
-        * because the snapshot's minimum visible transaction ID (xmin) must be
-        * referenced when determining the candidate catalog_xmin for the
-        * replication slot.
+        * snapshot during fast-forward mode (as is done in
+        * SnapBuildProcessChange()) because we require the snapshot's xmin when
+        * determining the candidate catalog_xmin for the replication slot. See
+        * SnapBuildProcessRunningXacts().
         */
        if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
                return;
@@ -477,10 +477,10 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer 
*buf)
        /*
         * If we don't have snapshot or we are just fast-forwarding, there is no
         * point in decoding data changes. However, it's crucial to build the 
base
-        * snapshot during fast-forward mode (as done in 
SnapBuildProcessChange())
-        * because the snapshot's minimum visible transaction ID (xmin) must be
-        * referenced when determining the candidate catalog_xmin for the
-        * replication slot.
+        * snapshot during fast-forward mode (as is done in
+        * SnapBuildProcessChange()) because we require the snapshot's xmin when
+        * determining the candidate catalog_xmin for the replication slot. See
+        * SnapBuildProcessRunningXacts().
         */
        if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
                return;

Reply via email to