On Mon, May 17, 2021 at 10:40 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Mon, May 17, 2021 at 9:06 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > On Thu, May 13, 2021 at 7:06 PM vignesh C <vignes...@gmail.com> wrote: > > > > > > On Thu, May 13, 2021 at 4:41 PM Michael Paquier <mich...@paquier.xyz> > > > wrote: > > > > Few comments: > > 1. > > + # Ensure a transactional logical decoding message shows up on the slot > > + $node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION tap_sub > > DISABLE"); > > > > After you have encapsulated this command in the function, the above > > comment doesn't make sense because we do this for both transactional > > and non-transactional messages. I suggest we can change it to > > something like: "This is done to ensure a logical decoding message is > > shown up on the slot". > > > > 2. > > +# Setup the subscription before checking > > pg_logical_slot_peek_binary_changes > > +sub setup_subscription > > > > I think here the functionality is more for the catchup of > > subscription, so it might be better to name the function as > > subscription_catchup or catchup_subscription. I think you can expand > > the comments atop this function a bit as suggested by Michael. > > > > One more point: > + $node_publisher->wait_for_catchup('tap_sub'); > + > + # Ensure a transactional logical decoding message shows up on the slot > + $node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION tap_sub > DISABLE"); > + > + # wait for the replication connection to drop from the publisher > + $node_publisher->poll_query_until('postgres', > + "SELECT COUNT(*) FROM pg_catalog.pg_replication_slots WHERE > slot_name = 'tap_sub' AND active='f'", 1); > > In the above sequence, wait_for_catchup will query pg_stat_replication > whereas after disabling subscription we are checking > pg_replication_slots. I understand from your explanation why we can't > rely on pg_stat_replication after DISABLE but it might be better to > check that the slot is active before disabling it. I think currently, > the test assumes that, isn't it better to have an explicit check for > that?
I felt this is not required, wait_for_catchup will poll_query_until the state = 'streaming', even if START_REPLICATION takes time, state will be in 'startup' state, this way poll_query_until will take care of handling this. On further analysis I found that we need to do "Alter subscription tap_sub ENABLE" and "ALTER subscription tap_sub DISABLE" multiple time, Instead we can change pg_logical_slot_peek_binary_changes to pg_logical_slot_get_binary_changes at appropriate steps. This way the common function can be removed and the enable/disable multiple times can be removed. If we are going ahead with this approach the above comments provided are no more valid. I have made the changes in similar lines in the attached patch. If you are ok we can go ahead with the new approach which will simplify the changes required. Thoughts? Regards, Vignesh
From 33fdfd312a9b652a40891c09dc9e075d2dd41e44 Mon Sep 17 00:00:00 2001 From: vignesh <vignesh21@gmail.com> Date: Mon, 17 May 2021 17:23:35 +0530 Subject: [PATCH v3] Messages test failure fix. This test was failing because there was no wait for catch up after creating subscription. After disabling subscription pg_stat_replication was used to verify if the walsender is exited. The steps of walsender exit includes cleaning up of walsendr and then releasing replication slot. There will be random test failure if pg_logical_slot_peek_binary_changes is called in the narrow time window. Fixed it by checking active column in pg_replication_slot instead of pg_stat_replication which is more reliable. --- src/test/subscription/t/020_messages.pl | 27 +++++-------------------- 1 file changed, 5 insertions(+), 22 deletions(-) diff --git a/src/test/subscription/t/020_messages.pl b/src/test/subscription/t/020_messages.pl index 0940d0f45f..6a76d96d12 100644 --- a/src/test/subscription/t/020_messages.pl +++ b/src/test/subscription/t/020_messages.pl @@ -35,12 +35,14 @@ $node_subscriber->safe_psql('postgres', "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr' PUBLICATION tap_pub" ); +$node_publisher->wait_for_catchup('tap_sub'); + # Ensure a transactional logical decoding message shows up on the slot $node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION tap_sub DISABLE"); # wait for the replication connection to drop from the publisher $node_publisher->poll_query_until('postgres', - 'SELECT COUNT(*) FROM pg_catalog.pg_stat_replication', 0); + "SELECT COUNT(*) FROM pg_catalog.pg_replication_slots WHERE slot_name = 'tap_sub' AND active='f'", 1); $node_publisher->safe_psql('postgres', "SELECT pg_logical_emit_message(true, 'pgoutput', 'a transactional message')" @@ -77,7 +79,7 @@ is($result, qq(1|pgoutput), $result = $node_publisher->safe_psql( 'postgres', qq( SELECT get_byte(data, 0) - FROM pg_logical_slot_peek_binary_changes('tap_sub', NULL, NULL, + FROM pg_logical_slot_get_binary_changes('tap_sub', NULL, NULL, 'proto_version', '1', 'publication_names', 'tap_pub') )); @@ -88,16 +90,6 @@ is( $result, qq(66 'option messages defaults to false so message (M) is not available on slot' ); -$node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION tap_sub ENABLE"); -$node_publisher->wait_for_catchup('tap_sub'); - -# ensure a non-transactional logical decoding message shows up on the slot -$node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION tap_sub DISABLE"); - -# wait for the replication connection to drop from the publisher -$node_publisher->poll_query_until('postgres', - 'SELECT COUNT(*) FROM pg_catalog.pg_stat_replication', 0); - $node_publisher->safe_psql('postgres', "INSERT INTO tab_test VALUES (1)"); my $message_lsn = $node_publisher->safe_psql('postgres', @@ -109,7 +101,7 @@ $node_publisher->safe_psql('postgres', "INSERT INTO tab_test VALUES (2)"); $result = $node_publisher->safe_psql( 'postgres', qq( SELECT get_byte(data, 0), get_byte(data, 1) - FROM pg_logical_slot_peek_binary_changes('tap_sub', NULL, NULL, + FROM pg_logical_slot_get_binary_changes('tap_sub', NULL, NULL, 'proto_version', '1', 'publication_names', 'tap_pub', 'messages', 'true') @@ -118,15 +110,6 @@ $result = $node_publisher->safe_psql( is($result, qq(77|0), 'non-transactional message on slot is M'); -$node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION tap_sub ENABLE"); -$node_publisher->wait_for_catchup('tap_sub'); - -$node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION tap_sub DISABLE"); - -# wait for the replication connection to drop from the publisher -$node_publisher->poll_query_until('postgres', - 'SELECT COUNT(*) FROM pg_catalog.pg_stat_replication', 0); - # Ensure a non-transactional logical decoding message shows up on the slot when # it is emitted within an aborted transaction. The message won't emit until # something advances the LSN, hence, we intentionally forces the server to -- 2.25.1