Hello,

On 2019-Sep-06, Andres Freund wrote:

> On 2019-08-19 08:51:43 -0700, Andres Freund wrote:
> > On August 19, 2019 7:43:28 AM PDT, Alvaro Herrera 
> > <alvhe...@2ndquadrant.com> wrote:

> > >Never mind.  I was able to reproduce it later, and verify that Andres'
> > >proposed strategy doesn't seem to fix the problem.  I'm going to study
> > >the problem again today.
> > 
> > Could you post the patch?

Here's a couple of patches.

always_decode_assignment.patch is Masahiko Sawada's patch, which has
been confirmed to fix the assertion failure.

assign-child.patch is what I understood you were proposing -- namely to
assign the subxid to the top-level xid on NEW_CID.  In order for it to
work at all, I had to remove a different safety check; but the assertion
still hits when running Ildar's test case.  So the patch doesn't
actually fix anything.  And I think it makes sense that it fails, since
the first thing that's happening in this patch is that we create both
the top-level xact and the subxact with the same LSN value, which is
what triggers the assertion in the first place.  It's possible that I
misunderstood what you were suggesting.

If you want to propose a different fix, be my guest, but failing that
I'm inclined to push always_decode_assignment.patch sometime before the
end of the week.

Thanks,

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 5315d93af0..582b6a2b96 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -361,10 +361,12 @@ DecodeHeap2Op(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	/*
 	 * If we don't have snapshot or we are just fast-forwarding, there is no
-	 * point in decoding changes.
+	 * point in decoding changes (except NEW_CID assignments, because we need
+	 * to have subxid applied to xid in that case.)
 	 */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
-		ctx->fast_forward)
+	if ((info != XLOG_HEAP2_NEW_CID) &&
+		(SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
+		 ctx->fast_forward))
 		return;
 
 	switch (info)
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 8ce28ad629..6faba6077e 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -778,9 +778,6 @@ ReorderBufferAssignChild(ReorderBuffer *rb, TransactionId xid,
 	txn = ReorderBufferTXNByXid(rb, xid, true, &new_top, lsn, true);
 	subtxn = ReorderBufferTXNByXid(rb, subxid, true, &new_sub, lsn, false);
 
-	if (new_top && !new_sub)
-		elog(ERROR, "subtransaction logged without previous top-level txn record");
-
 	if (!new_sub)
 	{
 		if (subtxn->is_known_as_subxact)
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 0bd1d0f954..326f5b7809 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -772,6 +772,13 @@ SnapBuildProcessNewCid(SnapBuild *builder, TransactionId xid,
 {
 	CommandId	cid;
 
+	/*
+	 * In rare cases, it is possible that we're seeing a subtransaction
+	 * for the first time.  Record it in that case.
+	 */
+	if (xlrec->top_xid != xid)
+		ReorderBufferAssignChild(builder->reorder, xlrec->top_xid, xid, lsn);
+
 	/*
 	 * we only log new_cid's if a catalog tuple was modified, so mark the
 	 * transaction as containing catalog modifications
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 5315d93af0..c53e7e2279 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -217,12 +217,15 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	uint8		info = XLogRecGetInfo(r) & XLOG_XACT_OPMASK;
 
 	/*
-	 * No point in doing anything yet, data could not be decoded anyway. It's
-	 * ok not to call ReorderBufferProcessXid() in that case, except in the
-	 * assignment case there'll not be any later records with the same xid;
-	 * and in the assignment case we'll not decode those xacts.
+	 * If the snapshot isn't yet fully built, we cannot decode anything, so
+	 * bail out.
+	 *
+	 * However, it's critical to process XLOG_XACT_ASSIGNMENT records even
+	 * when the snapshot is being built: it is possible to get later records
+	 * that require subxids to be properly assigned.
 	 */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT &&
+		info != XLOG_XACT_ASSIGNMENT)
 		return;
 
 	switch (info)

Reply via email to