On Tue, May 18, 2021 at 9:09 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Mon, May 17, 2021 at 5:48 PM vignesh C <vignes...@gmail.com> wrote: > > > > On Mon, May 17, 2021 at 10:40 AM Amit Kapila <amit.kapil...@gmail.com> > > wrote: > > > > > > One more point: > > > + $node_publisher->wait_for_catchup('tap_sub'); > > > + > > > + # Ensure a transactional logical decoding message shows up on the slot > > > + $node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION tap_sub > > > DISABLE"); > > > + > > > + # wait for the replication connection to drop from the publisher > > > + $node_publisher->poll_query_until('postgres', > > > + "SELECT COUNT(*) FROM pg_catalog.pg_replication_slots WHERE > > > slot_name = 'tap_sub' AND active='f'", 1); > > > > > > In the above sequence, wait_for_catchup will query pg_stat_replication > > > whereas after disabling subscription we are checking > > > pg_replication_slots. I understand from your explanation why we can't > > > rely on pg_stat_replication after DISABLE but it might be better to > > > check that the slot is active before disabling it. I think currently, > > > the test assumes that, isn't it better to have an explicit check for > > > that? > > > > I felt this is not required, wait_for_catchup will poll_query_until > > the state = 'streaming', even if START_REPLICATION takes time, state > > will be in 'startup' state, this way poll_query_until will take care > > of handling this. > > > > makes sense, but let's add some comments to clarify the same. >
Modified. > > On further analysis I found that we need to do "Alter subscription > > tap_sub ENABLE" and "ALTER subscription tap_sub DISABLE" multiple > > time, Instead we can change pg_logical_slot_peek_binary_changes to > > pg_logical_slot_get_binary_changes at appropriate steps. This way the > > common function can be removed and the enable/disable multiple times > > can be removed. > > > > I think that is a valid point. This was probably kept so that we can > peek multiple times for the same message to test various things but > that can be achieved with the way you have changed the test. > > One more thing, shouldn't we make auto_vacuum=off for this test by > using 'append_conf' before starting the publisher. That will avoid the > risk of empty transactions. I felt that makes sense, added it. Thanks for the comments, the attached patch has the changes for the same. Regards, Vignesh
From 7617be76996efd17f46cee16f9a9a18a23183f19 Mon Sep 17 00:00:00 2001 From: vignesh <vignes...@gmail.com> Date: Mon, 17 May 2021 17:23:35 +0530 Subject: [PATCH v4] Messages test failure fix. This test was failing because there was no wait for catch up after creating subscription. After disabling subscription pg_stat_replication was used to verify if the walsender is exited. The steps of walsender exit includes cleaning up of walsendr and then releasing replication slot. There will be random test failure if pg_logical_slot_peek_binary_changes is called in the narrow time window. Fixed it by checking active column in pg_replication_slot instead of pg_stat_replication which is more reliable. --- src/test/subscription/t/020_messages.pl | 32 ++++++++----------------- 1 file changed, 10 insertions(+), 22 deletions(-) diff --git a/src/test/subscription/t/020_messages.pl b/src/test/subscription/t/020_messages.pl index 0940d0f45f..aaafc9f11d 100644 --- a/src/test/subscription/t/020_messages.pl +++ b/src/test/subscription/t/020_messages.pl @@ -11,6 +11,8 @@ use Test::More tests => 5; # Create publisher node my $node_publisher = get_new_node('publisher'); $node_publisher->init(allows_streaming => 'logical'); +$node_publisher->append_conf('postgresql.conf', + 'autovacuum = off'); $node_publisher->start; # Create subscriber node @@ -35,12 +37,17 @@ $node_subscriber->safe_psql('postgres', "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr' PUBLICATION tap_pub" ); +# Here wait_for_catchup is used intentionally instead of pg_replication_slots +# as wait_for_catchup waits on poll_query_until until the state becomes +# 'streaming', which will make sure that the replication slot is active. +$node_publisher->wait_for_catchup('tap_sub'); + # Ensure a transactional logical decoding message shows up on the slot $node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION tap_sub DISABLE"); # wait for the replication connection to drop from the publisher $node_publisher->poll_query_until('postgres', - 'SELECT COUNT(*) FROM pg_catalog.pg_stat_replication', 0); + "SELECT COUNT(*) FROM pg_catalog.pg_replication_slots WHERE slot_name = 'tap_sub' AND active='f'", 1); $node_publisher->safe_psql('postgres', "SELECT pg_logical_emit_message(true, 'pgoutput', 'a transactional message')" @@ -77,7 +84,7 @@ is($result, qq(1|pgoutput), $result = $node_publisher->safe_psql( 'postgres', qq( SELECT get_byte(data, 0) - FROM pg_logical_slot_peek_binary_changes('tap_sub', NULL, NULL, + FROM pg_logical_slot_get_binary_changes('tap_sub', NULL, NULL, 'proto_version', '1', 'publication_names', 'tap_pub') )); @@ -88,16 +95,6 @@ is( $result, qq(66 'option messages defaults to false so message (M) is not available on slot' ); -$node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION tap_sub ENABLE"); -$node_publisher->wait_for_catchup('tap_sub'); - -# ensure a non-transactional logical decoding message shows up on the slot -$node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION tap_sub DISABLE"); - -# wait for the replication connection to drop from the publisher -$node_publisher->poll_query_until('postgres', - 'SELECT COUNT(*) FROM pg_catalog.pg_stat_replication', 0); - $node_publisher->safe_psql('postgres', "INSERT INTO tab_test VALUES (1)"); my $message_lsn = $node_publisher->safe_psql('postgres', @@ -109,7 +106,7 @@ $node_publisher->safe_psql('postgres', "INSERT INTO tab_test VALUES (2)"); $result = $node_publisher->safe_psql( 'postgres', qq( SELECT get_byte(data, 0), get_byte(data, 1) - FROM pg_logical_slot_peek_binary_changes('tap_sub', NULL, NULL, + FROM pg_logical_slot_get_binary_changes('tap_sub', NULL, NULL, 'proto_version', '1', 'publication_names', 'tap_pub', 'messages', 'true') @@ -118,15 +115,6 @@ $result = $node_publisher->safe_psql( is($result, qq(77|0), 'non-transactional message on slot is M'); -$node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION tap_sub ENABLE"); -$node_publisher->wait_for_catchup('tap_sub'); - -$node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION tap_sub DISABLE"); - -# wait for the replication connection to drop from the publisher -$node_publisher->poll_query_until('postgres', - 'SELECT COUNT(*) FROM pg_catalog.pg_stat_replication', 0); - # Ensure a non-transactional logical decoding message shows up on the slot when # it is emitted within an aborted transaction. The message won't emit until # something advances the LSN, hence, we intentionally forces the server to -- 2.25.1