Re: Possible bug in logical replication.

Arseny Sher Fri, 15 Jun 2018 08:29:04 -0700

Alvaro Herrera <alvhe...@2ndquadrant.com> writes:

> Can somebody (Arseny, Konstantin, horiguti, Sawada) please confirm that
> Michaël's commit fixes the reported bug?


I confirm that starting reading WAL since restart_lsn as implemented in
f731cfa fixes this issue, as well as the second issue tushar mentioned
at [1]. I think that the code still can be improved a bit though --
consider the attached patch:
* pg_logical_replication_slot_advance comment was not very informative
  and even a bit misleading: it said that we use confirmed_flush_lsn and
  restart_lsn, but didn't explain why.
* Excessive check in its main loop.
* Copy-paste comment fix.

[1] 
https://www.postgresql.org/message-id/5f85bf41-098e-c4e1-7332-9171fef57a0a%40enterprisedb.com

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

>From d8ed8ae3eec54b716d7dbb35379d0047a96c6c75 Mon Sep 17 00:00:00 2001
From: Arseny Sher <sher-...@yandex.ru>
Date: Fri, 15 Jun 2018 18:11:17 +0300
Subject: [PATCH] Cosmetic review for f731cfa.

* pg_logical_replication_slot_advance comment was not very informative and even
  a bit misleading: it said that we use confirmed_flush_lsn and restart_lsn, but
  didn't explain why.
* Excessive check in its main loop.
* Copy-paste comment fix.
---
 src/backend/replication/slotfuncs.c | 25 ++++++++++---------------
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 2806e1076c..597e81f4d9 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -341,24 +341,26 @@ pg_physical_replication_slot_advance(XLogRecPtr moveto)
 
 /*
  * Helper function for advancing logical replication slot forward.
- * The slot's restart_lsn is used as start point for reading records,
- * while confirmed_lsn is used as base point for the decoding context.
- * The LSN position to move to is checked by doing a per-record scan and
- * logical decoding which makes sure that confirmed_lsn is updated to a
- * LSN which allows the future slot consumer to get consistent logical
- * changes.
+ * While we could just do LogicalConfirmReceivedLocation updating
+ * confirmed_flush_lsn, we'd better digest WAL to advance restart_lsn allowing
+ * to recycle WAL and old catalog tuples. We do it in special fast_forward
+ * mode without actual replay.
  */
 static XLogRecPtr
 pg_logical_replication_slot_advance(XLogRecPtr moveto)
 {
 	LogicalDecodingContext *ctx;
 	ResourceOwner old_resowner = CurrentResourceOwner;
+	/*
+	 * Start reading WAL at restart_lsn, which certainly points to the valid
+	 * record.
+	 */
 	XLogRecPtr	startlsn = MyReplicationSlot->data.restart_lsn;
 	XLogRecPtr	retlsn = MyReplicationSlot->data.confirmed_flush;
 
 	PG_TRY();
 	{
-		/* restart at slot's confirmed_flush */
+		/* start_lsn doesn't matter here, we don't replay xacts at all */
 		ctx = CreateDecodingContext(InvalidXLogRecPtr,
 									NIL,
 									true,
@@ -388,17 +390,10 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
 			 */
 			startlsn = InvalidXLogRecPtr;
 
-			/*
-			 * The {begin_txn,change,commit_txn}_wrapper callbacks above will
-			 * store the description into our tuplestore.
-			 */
+			/* Changes are not actually produced in fast_forward mode. */
 			if (record != NULL)
 				LogicalDecodingProcessRecord(ctx, ctx->reader);
 
-			/* Stop once the moving point wanted by caller has been reached */
-			if (moveto <= ctx->reader->EndRecPtr)
-				break;
-
 			CHECK_FOR_INTERRUPTS();
 		}
 
-- 
2.11.0

Re: Possible bug in logical replication.

Reply via email to