On 2020-06-09 20:19, Andres Freund wrote:
Hi,

On 2020-01-28 17:01:14 +0900, Michael Paquier wrote:
So attached is an updated patch which addresses the problem just by
marking a physical slot as dirty if any advancing is done.  Some
documentation is added about the fact that an advance is persistent
only at the follow-up checkpoint.  And the tests are fixed to not use
a fake LSN but instead advance to the latest LSN position produced.


-       /* Update the on disk state when lsn was updated. */
-       if (XLogRecPtrIsInvalid(endlsn))
-       {
-               ReplicationSlotMarkDirty();
-               ReplicationSlotsComputeRequiredXmin(false);
-               ReplicationSlotsComputeRequiredLSN();
-               ReplicationSlotSave();
-       }
-

I am quite confused by the wholesale removal of these lines. That wasn't
in previous versions of the patch. As far as I can tell not calling
ReplicationSlotsComputeRequiredLSN() for the physical slot leads to the
global minimum LSN never beeing advanced, and thus WAL reserved by the
slot not being removable.  Only if there's some independent call to
ReplicationSlotsComputeRequiredLSN() XLogSetReplicationSlotMinimumLSN()
will be called, allowing for slots to advance.

I realize this stuff has been broken since the introduction in
9c7d06d6068 (due to the above being if (XLogRecPtrIsInvalid()) rather
than if (!XLogRecPtrIsInvalid()) , but this seems to make it even worse?


Yes, there was a ReplicationSlotsComputeRequiredLSN() call inside pg_physical_replication_slot_advance() in the v5 of the patch:

@@ -370,6 +370,11 @@ pg_physical_replication_slot_advance(XLogRecPtr moveto)
                MyReplicationSlot->data.restart_lsn = moveto;
                SpinLockRelease(&MyReplicationSlot->mutex);
                retlsn = moveto;
+
+               ReplicationSlotMarkDirty();
+
+               /* We moved retart_lsn, update the global value. */
+               ReplicationSlotsComputeRequiredLSN();

But later it has been missed and we have not noticed that.

I think that adding it back as per attached will be enough.


I find it really depressing how much obviously untested stuff gets
added in this area.


Prior to this patch pg_replication_slot_advance was not being tested at all. Unfortunately, added tests appeared to be not enough to cover all cases. It seems that the whole machinery of WAL holding and trimming is worth to be tested more thoroughly.


Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 1b929a603e..8e543e276f 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -419,6 +419,11 @@ pg_physical_replication_slot_advance(XLogRecPtr moveto)
 		 * crash, but this makes the data consistent after a clean shutdown.
 		 */
 		ReplicationSlotMarkDirty();
+
+		/*
+		 * We advanced retart_lsn, update the global minimum required value.
+		 */
+		ReplicationSlotsComputeRequiredLSN();
 	}
 
 	return retlsn;

Reply via email to