On Mon, Dec 30, 2019 at 11:17 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Fri, Dec 27, 2019 at 8:37 PM Alvaro Herrera <alvhe...@2ndquadrant.com> > wrote: > > > > On 2019-Dec-27, vignesh C wrote: > > > > > I felt amit solution also solves the problem. Attached patch has the > > > fix based on the solution proposed. > > > Thoughts? > > > > This seems a sensible fix to me, though I didn't try to reproduce the > > failure. > > > > > @@ -2472,6 +2457,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, > > > ReorderBufferTXN *txn) > > > } > > > > > > ReorderBufferSerializeChange(rb, txn, fd, change); > > > + txn->final_lsn = change->lsn; > > > dlist_delete(&change->node); > > > ReorderBufferReturnChange(rb, change); > > > > Should this be done insider ReorderBufferSerializeChange itself, instead > > of in its caller? > > > > makes sense. But, I think we should add a comment specifying the > reason why it is important to set final_lsn while serializing the > change.
Fixed > > Also, would it be sane to verify that the TXN > > doesn't already have a newer final_lsn? Maybe as an Assert. > > > > I don't think this is a good idea because we update the final_lsn with > commit_lsn in ReorderBufferCommit after which we can try to serialize > the remaining changes. Instead, we should update it only if the > change_lsn value is greater than final_lsn. > Fixed. Thanks Alvaro & Amit for your suggestions. I have made the changes based on your suggestions. Please find the updated patch for the same. I have also verified the patch in back branches. Separate patch was required for Release-10 branch, patch for the same is attached as 0001-Reorder-buffer-crash-while-aborting-old-transactions-REL_10.patch. Thoughts? Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
From dcd6c495c32d1dfd8ca3e723a0d45584ab5d10e7 Mon Sep 17 00:00:00 2001 From: vignesh <vignesh@localhost.localdomain> Date: Mon, 30 Dec 2019 22:59:05 +0530 Subject: [PATCH] Reorder buffer crash while aborting old transactions. While aborting aborted transactions it crashed as there are no reorderbuffer changes present in the list because all the reorderbuffer changes were serialized. Fixing it by storing the last change's lsn while serializing the reorderbuffer changes. This lsn will be used as final_lsn which will help in cleaning of spilled files in pg_replslot. --- src/backend/replication/logical/reorderbuffer.c | 26 +++++++++++-------------- src/include/replication/reorderbuffer.h | 4 ++-- 2 files changed, 13 insertions(+), 17 deletions(-) diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c index 53affeb..a807a75 100644 --- a/src/backend/replication/logical/reorderbuffer.c +++ b/src/backend/replication/logical/reorderbuffer.c @@ -1956,21 +1956,6 @@ ReorderBufferAbortOld(ReorderBuffer *rb, TransactionId oldestRunningXid) if (TransactionIdPrecedes(txn->xid, oldestRunningXid)) { - /* - * We set final_lsn on a transaction when we decode its commit or - * abort record, but we never see those records for crashed - * transactions. To ensure cleanup of these transactions, set - * final_lsn to that of their last change; this causes - * ReorderBufferRestoreCleanup to do the right thing. - */ - if (txn->serialized && txn->final_lsn == 0) - { - ReorderBufferChange *last = - dlist_tail_element(ReorderBufferChange, node, &txn->changes); - - txn->final_lsn = last->lsn; - } - elog(DEBUG2, "aborting old transaction %u", txn->xid); /* remove potential on-disk data, and deallocate this tx */ @@ -2679,6 +2664,17 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn, } pgstat_report_wait_end(); + /* + * We set final_lsn on a transaction when we decode its commit or abort + * record, but we never see those records for crashed transactions. To + * ensure cleanup of these transactions, set final_lsn to that of their + * last change; this causes ReorderBufferRestoreCleanup to do the right + * thing. Final_lsn would have been set with commit_lsn earlier when we + * decode it commit, no need to update in that case + */ + if (txn->final_lsn < change->lsn) + txn->final_lsn = change->lsn; + Assert(ondisk->change.action == change->action); } diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h index 0867ee9..b58c19a 100644 --- a/src/include/replication/reorderbuffer.h +++ b/src/include/replication/reorderbuffer.h @@ -188,8 +188,8 @@ typedef struct ReorderBufferTXN * * plain abort record * * prepared transaction abort * * error during decoding - * * for a crashed transaction, the LSN of the last change, regardless of - * what it was. + * * for a transaction with serialized changes, the LSN of the latest + * serialized one, unless one of the above cases. * ---- */ XLogRecPtr final_lsn; -- 1.8.3.1
From f38ee9810f21397faa1e0fbb699360ee129119df Mon Sep 17 00:00:00 2001 From: vignesh <vignesh@localhost.localdomain> Date: Mon, 30 Dec 2019 22:52:28 +0530 Subject: [PATCH] Reorder buffer crash while aborting old transactions. While aborting aborted transactions it crashed as there are no reorderbuffer changes present in the list because all the reorderbuffer changes were serialized. Fixing it by storing the last change's lsn while serializing the reorderbuffer changes. This lsn will be used as final_lsn which will help in cleaning of spilled files in pg_replslot. --- src/backend/replication/logical/reorderbuffer.c | 26 +++++++++++-------------- src/include/replication/reorderbuffer.h | 4 ++-- 2 files changed, 13 insertions(+), 17 deletions(-) diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c index be441c7..bc26ba1 100644 --- a/src/backend/replication/logical/reorderbuffer.c +++ b/src/backend/replication/logical/reorderbuffer.c @@ -1867,21 +1867,6 @@ ReorderBufferAbortOld(ReorderBuffer *rb, TransactionId oldestRunningXid) if (TransactionIdPrecedes(txn->xid, oldestRunningXid)) { - /* - * We set final_lsn on a transaction when we decode its commit or - * abort record, but we never see those records for crashed - * transactions. To ensure cleanup of these transactions, set - * final_lsn to that of their last change; this causes - * ReorderBufferRestoreCleanup to do the right thing. - */ - if (txn->serialized && txn->final_lsn == 0) - { - ReorderBufferChange *last = - dlist_tail_element(ReorderBufferChange, node, &txn->changes); - - txn->final_lsn = last->lsn; - } - elog(DEBUG2, "aborting old transaction %u", txn->xid); /* remove potential on-disk data, and deallocate this tx */ @@ -2457,6 +2442,17 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn, } pgstat_report_wait_end(); + /* + * We set final_lsn on a transaction when we decode its commit or abort + * record, but we never see those records for crashed transactions. To + * ensure cleanup of these transactions, set final_lsn to that of their + * last change; this causes ReorderBufferRestoreCleanup to do the right + * thing. Final_lsn would have been set with commit_lsn earlier when we + * decode it commit, no need to update in that case + */ + if (txn->final_lsn < change->lsn) + txn->final_lsn = change->lsn; + Assert(ondisk->change.action == change->action); } diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h index e5cbe25..cf2d4e7 100644 --- a/src/include/replication/reorderbuffer.h +++ b/src/include/replication/reorderbuffer.h @@ -167,8 +167,8 @@ typedef struct ReorderBufferTXN * * plain abort record * * prepared transaction abort * * error during decoding - * * for a crashed transaction, the LSN of the last change, regardless of - * what it was. + * * for a transaction with serialized changes, the LSN of the latest + * serialized one, unless one of the above cases. * ---- */ XLogRecPtr final_lsn; -- 1.8.3.1