Le vendredi 29 octobre 2021, 04:27:51 CEST Michael Paquier a écrit :
> On Thu, Oct 28, 2021 at 03:55:12PM +0200, Ronan Dunklau wrote:
> > Interesting ideas, thanks. For the record, the time drops from ~4.5s to 3s
> > on average on my machine.
> > I think if you reduce the size of the generate_series batches, this should
> > probably be reduced everywhere. With what we do though, inserting a single
> > line should work just as well, I wonder why we insist on inserting a
> > hundred lines ? I updated your patch with that small modification, it
> > also makes the code less verbose.
>
> Thanks for the extra numbers. I have added your suggestions,
> switching the dummy table to use a primary key with different values,
> while on it, as there is an argument that it makes debugging easier,
> and applied the speedup patch.
Thanks !
>
> >> +$standby->psql('',
> >> + "CREATE_REPLICATION_SLOT $folder_slot PHYSICAL (RESERVE_WAL)",
> >> + replication => 1);
> >> Here as well we could use a restart point to reduce the number of
> >> segments archived.
> >
> > The restart point should be very close, as we don't generate any activity
> > on the primary between the backup and the slot's creation. I'm not sure
> > adding the complexity of triggering a checkpoint on the primary and
> > waiting for the standby to catch up on it would be that useful.
>
> Yes, you are right here. The base backup taken from the primary
> at this point ensures a fresh point.
>
> +# This test is split in two, using the same standby: one test check the
> +# resume-from-folder case, the other the resume-from-slot one.
> This comment needs a refresh, as the resume-from-folder case is no
> more.
>
Done.
> +$standby->psql(
> + 'postgres',
> + "SELECT pg_promote(wait_seconds => 300)");
> This could be $standby->promote.
>
Oh, didn't know about that.
> +# Switch wal to make sure it is not a partial file but a complete
> segment.
> +$primary->psql('postgres', 'INSERT INTO test_table VALUES (1);');
> +$primary->psql('postgres', 'SELECT pg_switch_wal();');
> +$primary->wait_for_catchup($standby, 'replay', $primary->lsn('write'));
> This INSERT needs a slight change to adapt to the primary key of the
> table. This one is on me :p
Done.
>
> Anyway, is this first segment switch really necessary? From the data
> archived by pg_receivewal in the command testing the TLI jump, we
> finish with the following contents (contents generated after fixing
> the three INSERTs):
> 00000001000000000000000B
> 00000001000000000000000C
> 00000002000000000000000D
> 00000002000000000000000E.partial
> 00000002.history
>
> So, even if we don't do the first switch, we'd still have one
> completed segment on the previous timeline, before switching to the
> new timeline and the next segment (pg_receivewal is a bit inconsistent
> with the backend here, by the way, as the first segment on the new
> timeline would map with the last segment of the old timeline, but here
> we have a clean switch as of stop_streaming in pg_receivewal.c).
The first completed segment on the previous timeline comes from the fact we
stream from the restart point. I removed the switch to use the walfilename of
the replication slot's restart point instead. This means querying both the
standby (to get the replication slot's restart_lsn) and the primary (to have
access to pg_walfile_name).
We could use a single query on the primary (using the primary's checkpoint LSN
instead) but it feels a bit convoluted just to avoid a query on the standby.
--
Ronan Dunklau
>From eb65a8b5883d903dda78bb66c2e1795b48cafc24 Mon Sep 17 00:00:00 2001
From: Ronan Dunklau <ronan.dunk...@aiven.io>
Date: Tue, 26 Oct 2021 10:54:12 +0200
Subject: [PATCH v17] Add a test for pg_receivewal following timeline switch.
pg_receivewal is able to follow a timeline switch, but this was not
tested anywher. This test case verify that it works as expected both
when resuming from a replication slot and from the archive directory,
which have different methods of retrieving the timeline it left off.
---
src/bin/pg_basebackup/t/020_pg_receivewal.pl | 57 +++++++++++++++++++-
1 file changed, 56 insertions(+), 1 deletion(-)
diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
index 2da200396e..1fe32758a0 100644
--- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl
+++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
@@ -5,7 +5,7 @@ use strict;
use warnings;
use PostgreSQL::Test::Utils;
use PostgreSQL::Test::Cluster;
-use Test::More tests => 31;
+use Test::More tests => 35;
program_help_ok('pg_receivewal');
program_version_ok('pg_receivewal');
@@ -208,3 +208,58 @@ $primary->command_ok(
"WAL streamed from the slot's restart_lsn");
ok(-e "$slot_dir/$walfile_streamed",
"WAL from the slot's restart_lsn has been archived");
+
+# Test a timeline switch using a replication slot
+
+# Setup a standby for our tests
+my $backup_name = "basebackup";
+$primary->backup($backup_name);
+my $standby = PostgreSQL::Test::Cluster->new("standby");
+$standby->init_from_backup($primary, $backup_name, has_streaming => 1);
+$standby->start;
+
+# Create a replication slot on this new standby
+my $archive_slot = "archive_slot";
+$standby->psql('',
+ "CREATE_REPLICATION_SLOT $archive_slot PHYSICAL (RESERVE_WAL)",
+ replication => 1);
+# Wait for standby catchup
+$primary->wait_for_catchup($standby, 'replay', $primary->lsn('write'));
+# Get a walfilename from before the promotion to make sure it is archived
+# after promotion
+my $replication_slot_lsn = $standby->safe_psql(
+ 'postgres',
+ "SELECT restart_lsn
+ FROM pg_replication_slots
+ WHERE slot_name = '$archive_slot';");
+# pg_walfile_name is not supported on a standby
+my $walfile_before_promotion =
+ $primary->safe_psql('postgres', "SELECT pg_walfile_name('$replication_slot_lsn');");
+# Everything is setup, promote the standby to trigger a timeline switch
+$standby->promote;
+
+# Force a wal switch to make sure at least one full WAL is archived on the new
+# timeline, and fetch this walfilename.
+my $walfile_after_promotion = $standby->safe_psql('postgres',
+ "SELECT pg_walfile_name(pg_current_wal_insert_lsn());");
+$standby->psql('postgres', 'INSERT INTO test_table VALUES (6);');
+$standby->psql('postgres', 'SELECT pg_switch_wal();');
+$nextlsn =
+ $standby->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
+chomp($nextlsn);
+$standby->psql('postgres', 'INSERT INTO test_table VALUES (7);');
+
+# Now try to resume from the slot after the promotion.
+my $timeline_dir = $primary->basedir . '/timeline_wal';
+mkdir($timeline_dir);
+
+$standby->command_ok(
+ [ 'pg_receivewal', '-D', $timeline_dir, '--verbose', '--endpos', $nextlsn,
+ '--slot', $archive_slot, '--no-sync', '-n'],
+ "Stream some wal after promoting, resuming from the slot's position");
+ok(-e "$timeline_dir/$walfile_before_promotion",
+ "WAL from the old timeline has been archived resuming from the slot");
+ok(-e "$timeline_dir/$walfile_after_promotion",
+ "WAL from the new timeline has been archived resuming from the slot");
+ok(-e "$timeline_dir/00000002.history",
+ "Timeline history file has been archived resuming from the slot");
--
2.33.1