Hi,

On 5/8/23 4:42 AM, Amit Kapila wrote:
On Sat, May 6, 2023 at 9:33 PM Drouvot, Bertrand
<bertranddrouvot...@gmail.com> wrote:

On 5/6/23 3:28 PM, Amit Kapila wrote:
On Sat, May 6, 2023 at 1:52 PM Drouvot, Bertrand
<bertranddrouvot...@gmail.com> wrote:

Next steps:
=========
1. The first thing is we should verify this theory by adding some LOG
in KeepLogSeg() to see if the _logSegNo is reduced due to the value
returned by XLogGetReplicationSlotMinimumLSN().

Yeah, will do that early next week.

It's done with the following changes:

https://github.com/bdrouvot/postgres/commit/79e1bd9ab429a22f876b9364eb8a0da2dacaaef7#diff-c1cb3ab2a19606390c1a7ed00ffe5a45531702ca5faf999d401c548f8951c65bL7454-R7514

With that in place, there is one failing test here: 
https://cirrus-ci.com/task/5173216310722560

Where we can see:

2023-05-08 07:42:56.301 UTC [18038][checkpointer] LOCATION:  
UpdateMinRecoveryPoint, xlog.c:2500
2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOG:  00000: 
CreateRestartPoint: After XLByteToSeg RedoRecPtr is 0/4000098, _logSegNo is 4
2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOCATION:  
CreateRestartPoint, xlog.c:7271
2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOG:  00000: KeepLogSeg: 
segno changed to 4 due to XLByteToSeg
2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOCATION:  KeepLogSeg, 
xlog.c:7473
2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOG:  00000: KeepLogSeg: 
segno changed to 3 due to XLogGetReplicationSlotMinimumLSN()
2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOCATION:  KeepLogSeg, 
xlog.c:7483
2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOG:  00000: 
CreateRestartPoint: After KeepLogSeg RedoRecPtr is 0/4000098, endptr is 
0/4000148, _logSegNo is 3
2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOCATION:  
CreateRestartPoint, xlog.c:7284
2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOG:  00000: BDT1 about to 
call RemoveOldXlogFiles in CreateRestartPoint
2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOCATION:  
CreateRestartPoint, xlog.c:7313
2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOG:  00000: attempting to 
remove WAL segments older than log file 000000000000000000000002

So the suspicion about XLogGetReplicationSlotMinimumLSN() was correct 
(_logSegNo moved from
4 to 3 due to XLogGetReplicationSlotMinimumLSN()).

What about postponing the physical slot creation on the standby and the
cascading standby node initialization after this test?


Yeah, that is also possible. But, I have a few questions regarding
that: (a) There doesn't seem to be a physical slot on cascading
standby, if I am missing something, can you please point me to the
relevant part of the test?

That's right. There is a physical slot only on the primary and on the standby.

What I meant up-thread is to also postpone the cascading standby node 
initialization
after this test (once the physical slot on the standby is created).

Please find attached a proposal doing so.

(b) Which test is currently dependent on
the physical slot on standby?

Not a test but the cascading standby initialization with the 
"primary_slot_name" parameter.

Also, now I think that's better to have the physical slot on the standby + hsf 
set to on on the
cascading standby (coming from the standby backup).

Idea is to avoid any risk of logical slot invalidation on the cascading standby 
in the
standby promotion test.

That was not the case before the attached proposal though (hsf was off on the 
cascading standby).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 38c67b0fd8f2d83e97a93108484fe23c881dd91c Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot...@gmail.com>
Date: Mon, 8 May 2023 07:02:50 +0000
Subject: [PATCH v1] fix retaining WAL test

---
 .../t/035_standby_logical_decoding.pl         | 38 ++++++++++---------
 1 file changed, 21 insertions(+), 17 deletions(-)
 100.0% src/test/recovery/t/

diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl 
b/src/test/recovery/t/035_standby_logical_decoding.pl
index ad1def2474..4bda706eae 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -276,20 +276,6 @@ $node_standby->append_conf('postgresql.conf',
        max_replication_slots = 5]);
 $node_standby->start;
 $node_primary->wait_for_replay_catchup($node_standby);
-$node_standby->safe_psql('testdb', qq[SELECT * FROM 
pg_create_physical_replication_slot('$standby_physical_slotname');]);
-
-#######################
-# Initialize cascading standby node
-#######################
-$node_standby->backup($backup_name);
-$node_cascading_standby->init_from_backup(
-       $node_standby, $backup_name,
-       has_streaming => 1,
-       has_restoring => 1);
-$node_cascading_standby->append_conf('postgresql.conf',
-       qq[primary_slot_name = '$standby_physical_slotname']);
-$node_cascading_standby->start;
-$node_standby->wait_for_replay_catchup($node_cascading_standby, $node_primary);
 
 #######################
 # Initialize subscriber node
@@ -503,9 +489,6 @@ check_slots_conflicting_status(1);
 # Verify that invalidated logical slots do not lead to retaining WAL.
 ##################################################
 
-# Wait for the cascading standby to catchup before removing the WAL file(s)
-$node_standby->wait_for_replay_catchup($node_cascading_standby, $node_primary);
-
 # Get the restart_lsn from an invalidated slot
 my $restart_lsn = $node_standby->safe_psql('postgres',
        "SELECT restart_lsn from pg_replication_slots WHERE slot_name = 
'vacuum_full_activeslot' and conflicting is true;"
@@ -538,6 +521,27 @@ my $standby_walfile = $node_standby->data_dir . '/pg_wal/' 
. $walfile_name;
 ok(!-f "$standby_walfile",
        "invalidated logical slots do not lead to retaining WAL");
 
+# Now create the physical slot on the standby and initialize the cascading 
standby node.
+# It's done here because the physical slot on the standby could prevent the WAL
+# file to be removed in the test above (ensuring the WAL file removal could 
still be
+# possible but we want to keep the above test simple).
+# Also please note that the cascading standby will be created with 
hot_standby_feedback
+# set to on thanks to the last 
change_hot_standby_feedback_and_wait_for_xmins() call.
+$node_standby->safe_psql('testdb', qq[SELECT * FROM 
pg_create_physical_replication_slot('$standby_physical_slotname');]);
+
+#######################
+# Initialize cascading standby node
+#######################
+$node_standby->backup($backup_name);
+$node_cascading_standby->init_from_backup(
+       $node_standby, $backup_name,
+       has_streaming => 1,
+       has_restoring => 1);
+$node_cascading_standby->append_conf('postgresql.conf',
+       qq[primary_slot_name = '$standby_physical_slotname']);
+$node_cascading_standby->start;
+$node_standby->wait_for_replay_catchup($node_cascading_standby, $node_primary);
+
 ##################################################
 # Recovery conflict: Invalidate conflicting slots, including in-use slots
 # Scenario 2: conflict due to row removal with hot_standby_feedback off.
-- 
2.34.1

Reply via email to