On Wed, Oct 27, 2021 at 10:11:00AM +0200, Ronan Dunklau wrote:
> Sorry I sent an intermediary version of the patch, here is the correct one.

While looking at this patch, I have figured out a simple way to make
the tests faster without impacting the coverage.  The size of the WAL
segments archived is a bit of a bottleneck as they need to be zeroed
by pg_receivewal at creation.  This finishes by being a waste of time,
even if we don't flush the data.  So I'd like to switch the test so as
we use a segment size of 1MB, first.

A second thing is that we copy too many segments than really needed
when using the slot's restart_lsn as starting point as RESERVE_WAL
would use the current redo location, so it seems to me that a
checkpoint is in order before the slot creation.  A third thing is
that generating some extra data after the end LSN we want to use makes
the test much faster at the end.

With those three methods combined, the test goes down from 11s to 9s
here.  Attached is a patch I'd like to apply to make the test
cheaper.

I also had a look at your patch.  Here are some comments.

+# Cleanup the previous stream directories to reuse them
+unlink glob "'${stream_dir}/*'";
+unlink glob "'${slot_dir}/*'";
I think that we'd better choose a different location for the
archives.  Keeping the segments of the previous tests is useful for
debugging if a previous step of the test failed.

+$standby->psql('',
+  "CREATE_REPLICATION_SLOT $folder_slot PHYSICAL (RESERVE_WAL)",
+  replication => 1);
Here as well we could use a restart point to reduce the number of
segments archived.

+# Now, try to resume after the promotion, from the folder.
+$standby->command_ok(
+   [ 'pg_receivewal', '-D', $stream_dir, '--verbose', '--endpos', $nextlsn,
+    '--slot', $folder_slot, '--no-sync'],
+   "Stream some wal after promoting, resuming from the folder's position");
What is called the "resume-from-folder" test in the patch is IMO
too costly and brings little extra value, requiring two commands of
pg_receivewal (one to populate the folder and one to test the TLI jump
from the previously-populated point) to test basically the same thing
as when the starting point is taken from the slot.  Except that
restarting from the slot is twice cheaper.  The point of the
resume-from-folder case is to make sure that we restart from the point
of the archive folder rather than the slot's restart_lsn, but your
test fails this part, in fact, because the first command populating
the archive folder also uses "--slot $folder_slot", updating the
slot's restart_lsn before the second pg_receivewal uses this slot
again.

I think, as a whole, that testing for the case where an archive folder
is populated (without a slot!), followed by a second command where we
use a slot that has a restart_lsn older than the archive's location,
to not be that interesting.  If we include such a test, there is no
need to include that within the TLI jump part, in my opinion.  So I
think that we had better drop this part of the patch, and keep only
the case where we resume from a slot for the TLI jump.

The commands of pg_receivewal included in the test had better use -n
so as there is no infinite loop on failure.
--
Michael
diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
index 092c9b6f25..87edcb6200 100644
--- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl
+++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl
@@ -15,7 +15,7 @@ program_options_handling_ok('pg_receivewal');
 umask(0077);
 
 my $primary = PostgreSQL::Test::Cluster->new('primary');
-$primary->init(allows_streaming => 1);
+$primary->init(allows_streaming => 1, extra => ['--wal-segsize=1']);
 $primary->start;
 
 my $stream_dir = $primary->basedir . '/archive_wal';
@@ -167,7 +167,9 @@ mkdir($slot_dir);
 $slot_name = 'archive_slot';
 
 # Setup the slot, reserving WAL at creation (corresponding to the
-# last redo LSN here, actually).
+# last redo LSN here, actually, so use a checkpoint to reduce the
+# number of segments archived).
+$primary->psql('postgres', 'checkpoint;');
 $primary->psql('postgres',
 	"SELECT pg_create_physical_replication_slot('$slot_name', true);");
 
@@ -182,12 +184,16 @@ my $walfile_streamed = $primary->safe_psql(
 # Switch to a new segment, to make sure that the segment retained by the
 # slot is still streamed.  This may not be necessary, but play it safe.
 $primary->psql('postgres',
-	'INSERT INTO test_table VALUES (generate_series(1,100));');
+	'INSERT INTO test_table VALUES (generate_series(1,10));');
 $primary->psql('postgres', 'SELECT pg_switch_wal();');
 $nextlsn =
   $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();');
 chomp($nextlsn);
 
+# Add a bit more data to accelerate the end of the next commands.
+$primary->psql('postgres',
+	'INSERT INTO test_table VALUES (generate_series(1,10));');
+
 # Check case where the slot does not exist.
 $primary->command_fails_like(
 	[

Attachment: signature.asc
Description: PGP signature

Reply via email to