Hello, On 2019-Sep-06, Andres Freund wrote:
> On 2019-08-19 08:51:43 -0700, Andres Freund wrote: > > On August 19, 2019 7:43:28 AM PDT, Alvaro Herrera > > <alvhe...@2ndquadrant.com> wrote: > > >Never mind. I was able to reproduce it later, and verify that Andres' > > >proposed strategy doesn't seem to fix the problem. I'm going to study > > >the problem again today. > > > > Could you post the patch? Here's a couple of patches. always_decode_assignment.patch is Masahiko Sawada's patch, which has been confirmed to fix the assertion failure. assign-child.patch is what I understood you were proposing -- namely to assign the subxid to the top-level xid on NEW_CID. In order for it to work at all, I had to remove a different safety check; but the assertion still hits when running Ildar's test case. So the patch doesn't actually fix anything. And I think it makes sense that it fails, since the first thing that's happening in this patch is that we create both the top-level xact and the subxact with the same LSN value, which is what triggers the assertion in the first place. It's possible that I misunderstood what you were suggesting. If you want to propose a different fix, be my guest, but failing that I'm inclined to push always_decode_assignment.patch sometime before the end of the week. Thanks, -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c index 5315d93af0..582b6a2b96 100644 --- a/src/backend/replication/logical/decode.c +++ b/src/backend/replication/logical/decode.c @@ -361,10 +361,12 @@ DecodeHeap2Op(LogicalDecodingContext *ctx, XLogRecordBuffer *buf) /* * If we don't have snapshot or we are just fast-forwarding, there is no - * point in decoding changes. + * point in decoding changes (except NEW_CID assignments, because we need + * to have subxid applied to xid in that case.) */ - if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT || - ctx->fast_forward) + if ((info != XLOG_HEAP2_NEW_CID) && + (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT || + ctx->fast_forward)) return; switch (info) diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c index 8ce28ad629..6faba6077e 100644 --- a/src/backend/replication/logical/reorderbuffer.c +++ b/src/backend/replication/logical/reorderbuffer.c @@ -778,9 +778,6 @@ ReorderBufferAssignChild(ReorderBuffer *rb, TransactionId xid, txn = ReorderBufferTXNByXid(rb, xid, true, &new_top, lsn, true); subtxn = ReorderBufferTXNByXid(rb, subxid, true, &new_sub, lsn, false); - if (new_top && !new_sub) - elog(ERROR, "subtransaction logged without previous top-level txn record"); - if (!new_sub) { if (subtxn->is_known_as_subxact) diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c index 0bd1d0f954..326f5b7809 100644 --- a/src/backend/replication/logical/snapbuild.c +++ b/src/backend/replication/logical/snapbuild.c @@ -772,6 +772,13 @@ SnapBuildProcessNewCid(SnapBuild *builder, TransactionId xid, { CommandId cid; + /* + * In rare cases, it is possible that we're seeing a subtransaction + * for the first time. Record it in that case. + */ + if (xlrec->top_xid != xid) + ReorderBufferAssignChild(builder->reorder, xlrec->top_xid, xid, lsn); + /* * we only log new_cid's if a catalog tuple was modified, so mark the * transaction as containing catalog modifications
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c index 5315d93af0..c53e7e2279 100644 --- a/src/backend/replication/logical/decode.c +++ b/src/backend/replication/logical/decode.c @@ -217,12 +217,15 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf) uint8 info = XLogRecGetInfo(r) & XLOG_XACT_OPMASK; /* - * No point in doing anything yet, data could not be decoded anyway. It's - * ok not to call ReorderBufferProcessXid() in that case, except in the - * assignment case there'll not be any later records with the same xid; - * and in the assignment case we'll not decode those xacts. + * If the snapshot isn't yet fully built, we cannot decode anything, so + * bail out. + * + * However, it's critical to process XLOG_XACT_ASSIGNMENT records even + * when the snapshot is being built: it is possible to get later records + * that require subxids to be properly assigned. */ - if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT) + if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT && + info != XLOG_XACT_ASSIGNMENT) return; switch (info)