Hi,

On 5/2/23 8:28 AM, Amit Kapila wrote:
On Fri, Apr 28, 2023 at 2:24 PM Drouvot, Bertrand
<bertranddrouvot...@gmail.com> wrote:

I can see V7 failing on "Cirrus CI / macOS - Ventura - Meson" only (other 
machines are not complaining).

It does fail on "invalidated logical slots do not lead to retaining WAL", see 
https://cirrus-ci.com/task/4518083541336064

I'm not sure why it is failing, any idea?


I think the reason for the failure is that on standby, the test is not
able to remove the file corresponding to the invalid slot. You are
using pg_switch_wal() to generate a switch record and I think you need
one more WAL-generating statement after that to achieve your purpose
which is that during checkpoint, the tes removes the WAL file
corresponding to an invalid slot. Just doing checkpoint on primary may
not serve the need as that doesn't lead to any new insertion of WAL on
standby. Is your v6 failing in the same environment?

Thanks for the feedback!

No V6 was working fine.

If not, then it
is probably due to the reason that the test is doing insert after
pg_switch_wal() in that version. Why did you change the order of
insert in v7?


I thought doing the insert before the switch was ok and as my local test
was running fine I did not re-consider the ordering.

BTW, you can confirm the failure by changing the DEBUG2 message in
RemoveOldXlogFiles() to LOG. In the case, where the test fails, it may
not remove the WAL file corresponding to an invalid slot whereas it
will remove the WAL file when the test succeeds.

Yeah, I added more debug information and what I can see is that the WAL file
we want to see removed is "000000010000000000000003" while the standby emits:

"
2023-05-02 10:03:28.351 UTC [16971][checkpointer] LOG:  attempting to remove 
WAL segments older than log file 000000000000000000000002
2023-05-02 10:03:28.351 UTC [16971][checkpointer] LOG:  recycled write-ahead log file 
"000000010000000000000002"
"

As per your suggestion, changing the insert ordering (like in V8 attached) 
makes it now work on the failing environment too.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From defaeb000b09eab9ba7b5d08c032f81b5bd72a53 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot...@gmail.com>
Date: Fri, 28 Apr 2023 07:27:20 +0000
Subject: [PATCH v8] Add a test to verify that invalidated logical slots do not
 lead to retaining WAL.

---
 .../t/035_standby_logical_decoding.pl         | 39 ++++++++++++++++++-
 1 file changed, 37 insertions(+), 2 deletions(-)
 100.0% src/test/recovery/t/

diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl 
b/src/test/recovery/t/035_standby_logical_decoding.pl
index 66d264f230..a6b640d7fd 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -500,9 +500,44 @@ $node_standby->restart;
 check_slots_conflicting_status(1);
 
 ##################################################
-# Verify that invalidated logical slots do not lead to retaining WAL
+# Verify that invalidated logical slots do not lead to retaining WAL.
 ##################################################
-# XXXXX TODO
+
+# Before removing WAL file(s), ensure the cascading standby catch up
+$node_standby->wait_for_replay_catchup($node_cascading_standby,
+       $node_primary);
+
+# Get the restart_lsn from an invalidated slot
+my $restart_lsn = $node_standby->safe_psql('postgres',
+       "SELECT restart_lsn from pg_replication_slots WHERE slot_name = 
'vacuum_full_activeslot' and conflicting is true;"
+);
+
+chomp($restart_lsn);
+
+# As pg_walfile_name() can not be executed on the standby,
+# get the WAL file name associated to this lsn from the primary
+my $walfile_name = $node_primary->safe_psql('postgres',
+       "SELECT pg_walfile_name('$restart_lsn')");
+
+chomp($walfile_name);
+
+# Generate some activity and switch WAL file on the primary
+$node_primary->safe_psql(
+       'postgres', "create table retain_test(a int);
+                                                                        select 
pg_switch_wal();
+                                                                        insert 
into retain_test values(1);
+                                                                        
checkpoint;");
+
+# Wait for the standby to catch up
+$node_primary->wait_for_catchup($node_standby);
+
+# Request a checkpoint on the standby to trigger the WAL file(s) removal
+$node_standby->safe_psql('postgres', 'checkpoint;');
+
+# Verify that the wal file has not been retained on the standby
+my $standby_walfile = $node_standby->data_dir . '/pg_wal/' . $walfile_name;
+ok(!-f "$standby_walfile",
+       "invalidated logical slots do not lead to retaining WAL");
 
 ##################################################
 # Recovery conflict: Invalidate conflicting slots, including in-use slots
-- 
2.34.1

Reply via email to