On Wed, Oct 27, 2021 at 10:11:00AM +0200, Ronan Dunklau wrote: > Sorry I sent an intermediary version of the patch, here is the correct one.
While looking at this patch, I have figured out a simple way to make the tests faster without impacting the coverage. The size of the WAL segments archived is a bit of a bottleneck as they need to be zeroed by pg_receivewal at creation. This finishes by being a waste of time, even if we don't flush the data. So I'd like to switch the test so as we use a segment size of 1MB, first. A second thing is that we copy too many segments than really needed when using the slot's restart_lsn as starting point as RESERVE_WAL would use the current redo location, so it seems to me that a checkpoint is in order before the slot creation. A third thing is that generating some extra data after the end LSN we want to use makes the test much faster at the end. With those three methods combined, the test goes down from 11s to 9s here. Attached is a patch I'd like to apply to make the test cheaper. I also had a look at your patch. Here are some comments. +# Cleanup the previous stream directories to reuse them +unlink glob "'${stream_dir}/*'"; +unlink glob "'${slot_dir}/*'"; I think that we'd better choose a different location for the archives. Keeping the segments of the previous tests is useful for debugging if a previous step of the test failed. +$standby->psql('', + "CREATE_REPLICATION_SLOT $folder_slot PHYSICAL (RESERVE_WAL)", + replication => 1); Here as well we could use a restart point to reduce the number of segments archived. +# Now, try to resume after the promotion, from the folder. +$standby->command_ok( + [ 'pg_receivewal', '-D', $stream_dir, '--verbose', '--endpos', $nextlsn, + '--slot', $folder_slot, '--no-sync'], + "Stream some wal after promoting, resuming from the folder's position"); What is called the "resume-from-folder" test in the patch is IMO too costly and brings little extra value, requiring two commands of pg_receivewal (one to populate the folder and one to test the TLI jump from the previously-populated point) to test basically the same thing as when the starting point is taken from the slot. Except that restarting from the slot is twice cheaper. The point of the resume-from-folder case is to make sure that we restart from the point of the archive folder rather than the slot's restart_lsn, but your test fails this part, in fact, because the first command populating the archive folder also uses "--slot $folder_slot", updating the slot's restart_lsn before the second pg_receivewal uses this slot again. I think, as a whole, that testing for the case where an archive folder is populated (without a slot!), followed by a second command where we use a slot that has a restart_lsn older than the archive's location, to not be that interesting. If we include such a test, there is no need to include that within the TLI jump part, in my opinion. So I think that we had better drop this part of the patch, and keep only the case where we resume from a slot for the TLI jump. The commands of pg_receivewal included in the test had better use -n so as there is no infinite loop on failure. -- Michael
diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl index 092c9b6f25..87edcb6200 100644 --- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl +++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl @@ -15,7 +15,7 @@ program_options_handling_ok('pg_receivewal'); umask(0077); my $primary = PostgreSQL::Test::Cluster->new('primary'); -$primary->init(allows_streaming => 1); +$primary->init(allows_streaming => 1, extra => ['--wal-segsize=1']); $primary->start; my $stream_dir = $primary->basedir . '/archive_wal'; @@ -167,7 +167,9 @@ mkdir($slot_dir); $slot_name = 'archive_slot'; # Setup the slot, reserving WAL at creation (corresponding to the -# last redo LSN here, actually). +# last redo LSN here, actually, so use a checkpoint to reduce the +# number of segments archived). +$primary->psql('postgres', 'checkpoint;'); $primary->psql('postgres', "SELECT pg_create_physical_replication_slot('$slot_name', true);"); @@ -182,12 +184,16 @@ my $walfile_streamed = $primary->safe_psql( # Switch to a new segment, to make sure that the segment retained by the # slot is still streamed. This may not be necessary, but play it safe. $primary->psql('postgres', - 'INSERT INTO test_table VALUES (generate_series(1,100));'); + 'INSERT INTO test_table VALUES (generate_series(1,10));'); $primary->psql('postgres', 'SELECT pg_switch_wal();'); $nextlsn = $primary->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn();'); chomp($nextlsn); +# Add a bit more data to accelerate the end of the next commands. +$primary->psql('postgres', + 'INSERT INTO test_table VALUES (generate_series(1,10));'); + # Check case where the slot does not exist. $primary->command_fails_like( [
signature.asc
Description: PGP signature