On Sat, Mar 2, 2024 at 9:21 AM Zhijie Hou (Fujitsu) <houzj.f...@fujitsu.com> wrote: > > Apart from the comments, the code in WalSndWaitForWal was refactored > a bit to make it neater. Thanks Shveta for helping writing the code and doc. >
A few more comments: ================== 1. +# Wait until the primary server logs a warning indicating that it is waiting +# for the sb1_slot to catch up. +$primary->wait_for_log( + qr/replication slot \"sb1_slot\" specified in parameter standby_slot_names does not have active_pid/, + $offset); Shouldn't we wait for such a LOG even in the first test as well which involves two standbys and two logical subscribers? 2. +################################################## +# Test that logical replication will wait for the user-created inactive +# physical slot to catch up until we remove the slot from standby_slot_names. +################################################## I don't see anything different tested in this test from what we already tested in the first test involving two standbys and two logical subscribers. Can you please clarify if I am missing something? 3. Note that after receiving the shutdown signal, an ERROR + * is reported if any slots are dropped, invalidated, or inactive. This + * measure is taken to prevent the walsender from waiting indefinitely. + */ + if (NeedToWaitForStandby(target_lsn, flushed_lsn, wait_event)) Isn't this part of the comment should be moved inside NeedToWaitForStandby()? 4. + /* + * Update our idea of the currently flushed position only if we are + * not waiting for standbys to catch up, otherwise the standby would + * have to catch up to a newer WAL location in each cycle. + */ + if (wait_event != WAIT_EVENT_WAIT_FOR_STANDBY_CONFIRMATION) + { This functionality (in function WalSndWaitForWal()) seems to ensure that we first wait for the required WAL to be flushed and then wait for standbys. If true, we should cover that point in the comments here or somewhere in the function WalSndWaitForWal(). Apart from this, I have made a few modifications in the comments. -- With Regards, Amit Kapila.
diff --git a/src/test/recovery/t/040_standby_failover_slots_sync.pl b/src/test/recovery/t/040_standby_failover_slots_sync.pl index 5e4dd6c0ab..16483e52a9 100644 --- a/src/test/recovery/t/040_standby_failover_slots_sync.pl +++ b/src/test/recovery/t/040_standby_failover_slots_sync.pl @@ -505,8 +505,9 @@ ok( $standby1->poll_query_until( 'restart_lsn and confirmed_flush_lsn of slot lsub1_slot synced to standby'); ################################################## -# Test primary disallowing specified logical replication slots getting ahead of -# specified physical replication slots. It uses the following set up: +# Test that logical failover replication slots wait for the specified +# physical replication slots to receive the changes first. It uses the +# following set up: # # (physical standbys) # | ----> standby1 (primary_slot_name = sb1_slot) @@ -518,9 +519,9 @@ ok( $standby1->poll_query_until( # # standby_slot_names = 'sb1_slot' # -# Set up is configured in such a way that the logical slot of subscriber1 is -# enabled for failover, thus it will wait for the physical slot of -# standby1(sb1_slot) to catch up before sending decoded changes to subscriber1. +# The setup is configured in such a way that the logical slot of subscriber1 is +# enabled for failover, and thus the subscriber1 will wait for the physical +# slot of standby1(sb1_slot) to catch up before receiving the decoded changes. ################################################## $backup_name = 'backup3'; @@ -543,8 +544,8 @@ primary_slot_name = 'sb2_slot' $standby2->start; $primary->wait_for_replay_catchup($standby2); -# Configure primary to disallow any logical slots that enabled failover from -# getting ahead of specified physical replication slot (sb1_slot). +# Configure primary to disallow any logical slots that have enabled failover +# from getting ahead of the specified physical replication slot (sb1_slot). $primary->append_conf( 'postgresql.conf', qq( standby_slot_names = 'sb1_slot'