Hi, here is the remainder of my v45-0001 review. These comments are for the test code only.
====== Testcase #1 1. +# Testcase start +# +# Invalidate streaming standby slot and logical failover slot on primary due to +# inactive timeout. Also, check logical failover slot synced to standby from +# primary doesn't invalidate on its own, but gets the invalidated state from the +# primary. nit - s/primary/the primary/ (in a couple of places) nit - s/standby/the standby/ nit - other trivial tweaks. ~~~ 2. +# Create sync slot on primary +$primary->psql('postgres', + q{SELECT pg_create_logical_replication_slot('sync_slot1', 'test_decoding', false, false, true);} +); nit - s/primary/the primary/ ~~~ 3. +$primary->safe_psql( + 'postgres', qq[ + SELECT pg_create_physical_replication_slot(slot_name := 'sb_slot1', immediately_reserve := true); +]); Should this have a comment? ~~~ 4. +# Wait until standby has replayed enough data +$primary->wait_for_catchup($standby1); nit - s/standby/the standby/ ~~~ 5. +# Sync primary slot to standby +$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();"); nit - /Sync primary slot to standby/Sync the primary slots to the standby/ ~~~ 6. +# Confirm that logical failover slot is created on standby nit - s/Confirm that logical failover slot is created on standby/Confirm that the logical failover slot is created on the standby/ ~~~ 7. +is( $standby1->safe_psql( + 'postgres', + q{SELECT count(*) = 1 FROM pg_replication_slots + WHERE slot_name = 'sync_slot1' AND synced AND NOT temporary;} + ), + "t", + 'logical slot sync_slot1 has synced as true on standby'); IMO here you should also be checking that the sync slot state is NOT invalidated, just as a counterpoint for the test part later that checks that it IS invalidated. ~~~ 8. +my $inactive_timeout = 1; + +# Set timeout so that next checkpoint will invalidate inactive slot +$primary->safe_psql( + 'postgres', qq[ + ALTER SYSTEM SET replication_slot_inactive_timeout TO '${inactive_timeout}s'; +]); +$primary->reload; 8a. nit - I think that $inactive_timeout assignment belongs below your comment. ~ 8b. nit - s/Set timeout so that next checkpoint will invalidate inactive slot/Set timeout GUC so that the next checkpoint will invalidate inactive slots/ ~~~ 9. +# Check for logical failover slot to become inactive on primary. Note that +# nobody has acquired slot yet, so it must get invalidated due to +# inactive timeout. nit - /Check for logical failover slot to become inactive on primary./Wait for logical failover slot to become inactive on the primary./ nit - /has acquired slot/has acquired the slot/ ~~~ 10. +# Sync primary slot to standby. Note that primary slot has already been +# invalidated due to inactive timeout. Standby must just sync inavalidated +# state. nit - minor, add "the". fix typo "inavalidated", etc. suggestion: Re-sync the primary slots to the standby. Note that the primary slot was already invalidated (above) due to inactive timeout. The standby must just sync the invalidated state. ~~~ 11. +# Make standby slot on primary inactive and check for invalidation +$standby1->stop; nit - /standby slot/the standby slot/ nit - /on primary/on the primary/ ====== Testcase #2 12. I'm not sure it is necessary to do all this extra work. IIUC, there was already almost everything you needed in the previous Testcase #1. So, I thought you could just combine this extra standby timeout test in Testcase #1. Indeed, your Testcase #1 comment still says it is doing this: ("Also, check logical failover slot synced to standby from primary doesn't invalidate on its own,...") e.g. - NEW: set the GUC timeout on the standby - sync the sync_slot (already doing in test #1) - ensure the synced slot is NOT invalid (already suggested above for test #1) - NEW: then do a standby sleep > timeout duration - NEW: then do a standby CHECKPOINT... - NEW: then ensure the sync slot invalidation did NOT happen - then proceed with the rest of test #1... ====== Testcase #3 13. nit - remove a few blank lines to group associated statements together. ~~~ 14. +$publisher->safe_psql( + 'postgres', qq[ + ALTER SYSTEM SET replication_slot_inactive_timeout TO ' ${inactive_timeout}s'; +]); +$publisher->reload; nit - this deserves a comment, the same as in Testcase #1 ====== sub wait_for_slot_invalidation 15. +# Check for slot to first become inactive and then get invalidated +sub check_for_slot_invalidation nit - IMO the previous name was better (e.g. "wait_for.." instead of "check_for...") because that describes exactly what the subroutine is doing. suggestion: # Wait for the slot to first become inactive and then get invalidated sub wait_for_slot_invalidation ~~~ 16. +{ + my ($node, $slot, $offset, $inactive_timeout) = @_; + my $name = $node->name; The variable $name seems too vague. How about $node_name? ~~~ 17. + # Wait for invalidation reason to be set + $node->poll_query_until( + 'postgres', qq[ + SELECT COUNT(slot_name) = 1 FROM pg_replication_slots + WHERE slot_name = '$slot' AND + invalidation_reason = 'inactive_timeout'; + ]) + or die + "Timed out while waiting for invalidation reason of slot $slot to be set on node $name"; 17a. nit - /# Wait for invalidation reason to be set/# Check that the invalidation reason is 'inactive_timeout'/ IIUC, the 'trigger_slot_invalidation' function has already invalidated the slot at this point, so we are not really "Waiting..."; we are "Checking..." that the reason was correctly set. ~ 17b. I think this code fragment maybe would be better put inside the 'trigger_slot_invalidation' function. (I've done this in the nitpicks attachment) ~~~ 18. + # Check that invalidated slot cannot be acquired + my ($result, $stdout, $stderr); + + ($result, $stdout, $stderr) = $node->psql( + 'postgres', qq[ + SELECT pg_replication_slot_advance('$slot', '0/1'); + ]); 18a. s/Check that invalidated slot/Check that an invalidated slot/ ~ 18b. nit - Remove some blank lines, because the comment applies to all below it. ====== sub trigger_slot_invalidation 19. +# Trigger slot invalidation and confirm it in server log +sub trigger_slot_invalidation nit - s/confirm it in server log/confirm it in the server log/ ~ 20. +{ + my ($node, $slot, $offset, $inactive_timeout) = @_; + my $name = $node->name; + my $invalidated = 0; (same as the other subroutine) nit - The variable $name seems too vague. How about $node_name? ====== Please refer to the attached nitpicks top-up patch which implements most of the above nits. ====== Kind Regards, Peter Smith. Fujitsu Australia
diff --git a/src/test/recovery/t/050_invalidate_slots.pl b/src/test/recovery/t/050_invalidate_slots.pl index 669d6cc..34b46d5 100644 --- a/src/test/recovery/t/050_invalidate_slots.pl +++ b/src/test/recovery/t/050_invalidate_slots.pl @@ -12,10 +12,10 @@ use Time::HiRes qw(usleep); # ============================================================================= # Testcase start # -# Invalidate streaming standby slot and logical failover slot on primary due to -# inactive timeout. Also, check logical failover slot synced to standby from -# primary doesn't invalidate on its own, but gets the invalidated state from the -# primary. +# Invalidate a streaming standby slot and logical failover slot on the primary +# due to inactive timeout. Also, check that a logical failover slot synced to +# the standby from the primary doesn't invalidate on its own, but gets the +# invalidated state from the primary. # Initialize primary my $primary = PostgreSQL::Test::Cluster->new('primary'); @@ -45,7 +45,7 @@ primary_slot_name = 'sb_slot1' primary_conninfo = '$connstr dbname=postgres' )); -# Create sync slot on primary +# Create sync slot on the primary $primary->psql('postgres', q{SELECT pg_create_logical_replication_slot('sync_slot1', 'test_decoding', false, false, true);} ); @@ -57,13 +57,13 @@ $primary->safe_psql( $standby1->start; -# Wait until standby has replayed enough data +# Wait until the standby has replayed enough data $primary->wait_for_catchup($standby1); -# Sync primary slot to standby +# Sync the primary slots to the standby $standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();"); -# Confirm that logical failover slot is created on standby +# Confirm that the logical failover slot is created on the standby is( $standby1->safe_psql( 'postgres', q{SELECT count(*) = 1 FROM pg_replication_slots @@ -73,24 +73,24 @@ is( $standby1->safe_psql( 'logical slot sync_slot1 has synced as true on standby'); my $logstart = -s $primary->logfile; -my $inactive_timeout = 1; -# Set timeout so that next checkpoint will invalidate inactive slot +# Set timeout GUC so that that next checkpoint will invalidate inactive slots +my $inactive_timeout = 1; $primary->safe_psql( 'postgres', qq[ ALTER SYSTEM SET replication_slot_inactive_timeout TO '${inactive_timeout}s'; ]); $primary->reload; -# Check for logical failover slot to become inactive on primary. Note that +# Wait for logical failover slot to become inactive on the primary. Note that # nobody has acquired slot yet, so it must get invalidated due to # inactive timeout. -check_for_slot_invalidation($primary, 'sync_slot1', $logstart, +wait_for_slot_invalidation($primary, 'sync_slot1', $logstart, $inactive_timeout); -# Sync primary slot to standby. Note that primary slot has already been -# invalidated due to inactive timeout. Standby must just sync inavalidated -# state. +# Re-sync the primary slots to the standby. Note that the primary slot was +# already invalidated (above) due to inactive timeout. The standby must just +# sync the invalidated state. $standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();"); $standby1->poll_query_until( 'postgres', qq[ @@ -101,9 +101,9 @@ $standby1->poll_query_until( or die "Timed out while waiting for sync_slot1 invalidation to be synced on standby"; -# Make standby slot on primary inactive and check for invalidation +# Make the standby slot on the primary inactive and check for invalidation $standby1->stop; -check_for_slot_invalidation($primary, 'sb_slot1', $logstart, +wait_for_slot_invalidation($primary, 'sb_slot1', $logstart, $inactive_timeout); # Testcase end @@ -223,14 +223,12 @@ $publisher->safe_psql( $subscriber->safe_psql('postgres', "CREATE SUBSCRIPTION sub CONNECTION '$publisher_connstr' PUBLICATION pub WITH (slot_name = 'lsub1_slot', create_slot = false)" ); - $subscriber->wait_for_subscription_sync($publisher, 'sub'); - my $result = $subscriber->safe_psql('postgres', "SELECT count(*) FROM test_tbl"); - is($result, qq(5), "check initial copy was done"); +# Set timeout GUC so that that next checkpoint will invalidate inactive slots $publisher->safe_psql( 'postgres', qq[ ALTER SYSTEM SET replication_slot_inactive_timeout TO ' ${inactive_timeout}s'; @@ -241,17 +239,17 @@ $logstart = -s $publisher->logfile; # Make subscriber slot on publisher inactive and check for invalidation $subscriber->stop; -check_for_slot_invalidation($publisher, 'lsub1_slot', $logstart, +wait_for_slot_invalidation($publisher, 'lsub1_slot', $logstart, $inactive_timeout); # Testcase end # ============================================================================= -# Check for slot to first become inactive and then get invalidated -sub check_for_slot_invalidation +# Wait for slot to first become inactive and then get invalidated +sub wait_for_slot_invalidation { my ($node, $slot, $offset, $inactive_timeout) = @_; - my $name = $node->name; + my $node_name = $node->name; # Wait for slot to become inactive $node->poll_query_until( @@ -261,45 +259,33 @@ sub check_for_slot_invalidation inactive_since IS NOT NULL; ]) or die - "Timed out while waiting for slot $slot to become inactive on node $name"; + "Timed out while waiting for slot $slot to become inactive on node $node_name"; trigger_slot_invalidation($node, $slot, $offset, $inactive_timeout); - # Wait for invalidation reason to be set - $node->poll_query_until( - 'postgres', qq[ - SELECT COUNT(slot_name) = 1 FROM pg_replication_slots - WHERE slot_name = '$slot' AND - invalidation_reason = 'inactive_timeout'; - ]) - or die - "Timed out while waiting for invalidation reason of slot $slot to be set on node $name"; - - # Check that invalidated slot cannot be acquired + # Check that an invalidated slot cannot be acquired my ($result, $stdout, $stderr); - ($result, $stdout, $stderr) = $node->psql( 'postgres', qq[ SELECT pg_replication_slot_advance('$slot', '0/1'); ]); - ok( $stderr =~ /can no longer get changes from replication slot "$slot"/, - "detected error upon trying to acquire invalidated slot $slot on node $name" + "detected error upon trying to acquire invalidated slot $slot on node $node_name" ) or die - "could not detect error upon trying to acquire invalidated slot $slot on node $name"; + "could not detect error upon trying to acquire invalidated slot $slot on node $node_name"; } -# Trigger slot invalidation and confirm it in server log +# Trigger slot invalidation and confirm it in the server log sub trigger_slot_invalidation { my ($node, $slot, $offset, $inactive_timeout) = @_; - my $name = $node->name; + my $node_name = $node->name; my $invalidated = 0; # Give enough time to avoid multiple checkpoints - sleep($inactive_timeout+1); + sleep($inactive_timeout + 1); for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default; $i++) { @@ -314,8 +300,18 @@ sub trigger_slot_invalidation usleep(100_000); } ok($invalidated, - "check that slot $slot invalidation has been logged on node $name" + "check that slot $slot invalidation has been logged on node $node_name" ); + + # Check that the invalidation reason is 'inactive_timeout' + $node->poll_query_until( + 'postgres', qq[ + SELECT COUNT(slot_name) = 1 FROM pg_replication_slots + WHERE slot_name = '$slot' AND + invalidation_reason = 'inactive_timeout'; + ]) + or die + "Timed out while waiting for invalidation reason of slot $slot to be set on node $node_name"; } done_testing();