On Wed, 29 Jan 2025 at 15:58, vignesh C <vignes...@gmail.com> wrote: > > On Fri, 24 Jan 2025 at 09:52, Shlok Kyal <shlok.kyal....@gmail.com> wrote: > > > > I have added the test in the latest patch. > > Few comments: > 1) Let's rearrange this query slightly so that the "PT.pubname IN > (<pub-names>)" appears at the end, the reason being that it will > be easy to copy/paste and edit it to include the publications names if > it is at the end: > +++ b/doc/src/sgml/ref/create_subscription.sgml > @@ -534,13 +534,15 @@ CREATE SUBSCRIPTION <replaceable > class="parameter">subscription_name</replaceabl > <programlisting> > # substitute <pub-names> below with your publication name(s) to > be queried > SELECT DISTINCT PT.schemaname, PT.tablename > -FROM pg_publication_tables PT, > +FROM pg_publication_tables PT > + JOIN pg_class C ON (C.relname = PT.tablename) > + JOIN pg_namespace N ON (N.nspname = PT.schemaname), > pg_subscription_rel PS > - JOIN pg_class C ON (C.oid = PS.srrelid) > - JOIN pg_namespace N ON (N.oid = C.relnamespace) > -WHERE N.nspname = PT.schemaname AND > - C.relname = PT.tablename AND > - PT.pubname IN (<pub-names>); > +WHERE C.relnamespace = N.oid AND > + PT.pubname IN (<pub-names>) AND > + (PS.srrelid = C.oid OR > + C.oid IN (SELECT relid FROM pg_partition_ancestors(PS.srrelid) UNION > + SELECT relid FROM pg_partition_tree(PS.srrelid))); > > 2) The same should be handled in the PG17 version patch too. > > 3) Currently the setup is done like: > node_B(table tab_part2 - publication pub_b_a) replicating to > node_A(sub_a_b subscription) > node_A(table tab_main - publication pub_a_c) replicating to node_C(sub_a_c) > > +############################################################################### > +# Specifying origin = NONE and copy_data = on must raise WARNING if > we subscribe > +# to a partitioned table and this table contains any remotely originated > data. > +############################################################################### > + > +# create a partition table on node A > +$node_A->safe_psql( > + 'postgres', qq( > +CREATE TABLE tab_main(a int) PARTITION BY RANGE(a); > +CREATE TABLE tab_part1 PARTITION OF tab_main FOR VALUES FROM (0) TO (5); > +CREATE TABLE tab_part2(a int) PARTITION BY RANGE(a); > +CREATE TABLE tab_part2_1 PARTITION OF tab_part2 FOR VALUES FROM (5) TO (10); > +ALTER TABLE tab_main ATTACH PARTITION tab_part2 FOR VALUES FROM (5) to (10); > +)); > + > +# create a table on node B which will act as a source for a partition on > node A > +$node_B->safe_psql( > + 'postgres', qq( > > Can we change this like below to make review easier: > node_A(table tab_part2 - publication pub_b_a) replicating to > node_B(sub_a_b subscription) > node_B(table tab_main - publication pub_a_c) replicating to node_C(sub_a_c) > > Also add something similar like above to the comment. >
I have addressed the comments. Here is an updated patch. Thanks and Regards, Shlok Kyal
From ab9f7ffa4810959577b1d0690519717e8543485a Mon Sep 17 00:00:00 2001 From: Hou Zhijie <houzj.fnst@cn.fujitsu.com> Date: Tue, 21 Jan 2025 16:18:34 +0800 Subject: [PATCH v8 1/2] Improve logging for data origin discrepancies in table synchronization Previously, a WARNING was issued during initial table synchronization with origin=NONE only when the publisher subscribed to the same table from other publishers, indicating potential data origination from different origins. However, it's possible for the publisher to subscribe to the partition ancestors or partition children of the table from other publishers, which could also result in mixed-origin data inclusion. This patch expands the check to consider both the subscribed table's ancestors and children. A WARNING will now be logged if any of these related tables could contain data originating from different sources. --- doc/src/sgml/ref/create_subscription.sgml | 12 +++++++----- src/backend/commands/subscriptioncmds.c | 15 +++++++++------ 2 files changed, 16 insertions(+), 11 deletions(-) diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml index 6cf7d4f9a1..57dec28a5d 100644 --- a/doc/src/sgml/ref/create_subscription.sgml +++ b/doc/src/sgml/ref/create_subscription.sgml @@ -534,12 +534,14 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl <programlisting> # substitute <pub-names> below with your publication name(s) to be queried SELECT DISTINCT PT.schemaname, PT.tablename -FROM pg_publication_tables PT, +FROM pg_publication_tables PT + JOIN pg_class C ON (C.relname = PT.tablename) + JOIN pg_namespace N ON (N.nspname = PT.schemaname), pg_subscription_rel PS - JOIN pg_class C ON (C.oid = PS.srrelid) - JOIN pg_namespace N ON (N.oid = C.relnamespace) -WHERE N.nspname = PT.schemaname AND - C.relname = PT.tablename AND +WHERE C.relnamespace = N.oid AND + (PS.srrelid = C.oid OR + C.oid IN (SELECT relid FROM pg_partition_ancestors(PS.srrelid) UNION + SELECT relid FROM pg_partition_tree(PS.srrelid))) AND PT.pubname IN (<pub-names>); </programlisting></para> diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c index 2d8a71ca1e..4aec73bcc6 100644 --- a/src/backend/commands/subscriptioncmds.c +++ b/src/backend/commands/subscriptioncmds.c @@ -2083,11 +2083,12 @@ AlterSubscriptionOwner_oid(Oid subid, Oid newOwnerId) } /* - * Check and log a warning if the publisher has subscribed to the same table - * from some other publisher. This check is required only if "copy_data = true" - * and "origin = none" for CREATE SUBSCRIPTION and - * ALTER SUBSCRIPTION ... REFRESH statements to notify the user that data - * having origin might have been copied. + * Check and log a warning if the publisher has subscribed to the same table, + * its partition ancestors (if it's a partition), or its partition children (if + * it's a partitioned table), from some other publishers. This check is + * required only if "copy_data = true" and "origin = none" for CREATE + * SUBSCRIPTION and ALTER SUBSCRIPTION ... REFRESH statements to notify the + * user that data having origin might have been copied. * * This check need not be performed on the tables that are already added * because incremental sync for those tables will happen through WAL and the @@ -2117,7 +2118,9 @@ check_publications_origin(WalReceiverConn *wrconn, List *publications, "SELECT DISTINCT P.pubname AS pubname\n" "FROM pg_publication P,\n" " LATERAL pg_get_publication_tables(P.pubname) GPT\n" - " JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n" + " JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid OR" + " GPT.relid IN (SELECT relid FROM pg_partition_ancestors(PS.srrelid) UNION" + " SELECT relid FROM pg_partition_tree(PS.srrelid))),\n" " pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)\n" "WHERE C.oid = GPT.relid AND P.pubname IN ("); GetPublicationsStr(publications, &cmd, true); -- 2.34.1
From 7da437f55ec13134d070d1a1aeed143dbc856a98 Mon Sep 17 00:00:00 2001 From: Shlok Kyal <shlok.kyal.oss@gmail.com> Date: Thu, 23 Jan 2025 12:59:19 +0530 Subject: [PATCH v8-PG_17-PG_16] Improve logging for data origin discrepancies in table synchronization Previously, a WARNING was issued during initial table synchronization with origin=NONE only when the publisher subscribed to the same table from other publishers, indicating potential data origination from different origins. However, it's possible for the publisher to subscribe to the partition ancestors or partition children of the table from other publishers, which could also result in mixed-origin data inclusion. This patch expands the check to consider both the subscribed table's ancestors and children. A WARNING will now be logged if any of these related tables could contain data originating from different sources. --- doc/src/sgml/ref/create_subscription.sgml | 12 +++++++----- src/backend/commands/subscriptioncmds.c | 15 +++++++++------ 2 files changed, 16 insertions(+), 11 deletions(-) diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml index 740b7d9421..c9c8dd440d 100644 --- a/doc/src/sgml/ref/create_subscription.sgml +++ b/doc/src/sgml/ref/create_subscription.sgml @@ -526,12 +526,14 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl <programlisting> # substitute <pub-names> below with your publication name(s) to be queried SELECT DISTINCT PT.schemaname, PT.tablename -FROM pg_publication_tables PT, +FROM pg_publication_tables PT + JOIN pg_class C ON (C.relname = PT.tablename) + JOIN pg_namespace N ON (N.nspname = PT.schemaname), pg_subscription_rel PS - JOIN pg_class C ON (C.oid = PS.srrelid) - JOIN pg_namespace N ON (N.oid = C.relnamespace) -WHERE N.nspname = PT.schemaname AND - C.relname = PT.tablename AND +WHERE C.relnamespace = N.oid AND + (PS.srrelid = C.oid OR + C.oid IN (SELECT relid FROM pg_partition_ancestors(PS.srrelid) UNION + SELECT relid FROM pg_partition_tree(PS.srrelid))) AND PT.pubname IN (<pub-names>); </programlisting></para> diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c index 8ecb6e0bb8..9467f58a23 100644 --- a/src/backend/commands/subscriptioncmds.c +++ b/src/backend/commands/subscriptioncmds.c @@ -2012,11 +2012,12 @@ AlterSubscriptionOwner_oid(Oid subid, Oid newOwnerId) } /* - * Check and log a warning if the publisher has subscribed to the same table - * from some other publisher. This check is required only if "copy_data = true" - * and "origin = none" for CREATE SUBSCRIPTION and - * ALTER SUBSCRIPTION ... REFRESH statements to notify the user that data - * having origin might have been copied. + * Check and log a warning if the publisher has subscribed to the same table, + * its partition ancestors (if it's a partition), or its partition children (if + * it's a partitioned table), from some other publishers. This check is + * required only if "copy_data = true" and "origin = none" for CREATE + * SUBSCRIPTION and ALTER SUBSCRIPTION ... REFRESH statements to notify the + * user that data having origin might have been copied. * * This check need not be performed on the tables that are already added * because incremental sync for those tables will happen through WAL and the @@ -2046,7 +2047,9 @@ check_publications_origin(WalReceiverConn *wrconn, List *publications, "SELECT DISTINCT P.pubname AS pubname\n" "FROM pg_publication P,\n" " LATERAL pg_get_publication_tables(P.pubname) GPT\n" - " JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n" + " JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid OR" + " GPT.relid IN (SELECT relid FROM pg_partition_ancestors(PS.srrelid) UNION" + " SELECT relid FROM pg_partition_tree(PS.srrelid))),\n" " pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)\n" "WHERE C.oid = GPT.relid AND P.pubname IN ("); get_publications_str(publications, &cmd, true); -- 2.34.1
From 10d4cab23486d04004114ef415cbd4b79ff31d0e Mon Sep 17 00:00:00 2001 From: Shlok Kyal <shlok.kyal.oss@gmail.com> Date: Wed, 22 Jan 2025 20:59:17 +0530 Subject: [PATCH v8 2/2] Test for Improve logging for data origin discrepancies in table synchronization --- src/test/subscription/t/030_origin.pl | 104 ++++++++++++++++++++++++++ 1 file changed, 104 insertions(+) diff --git a/src/test/subscription/t/030_origin.pl b/src/test/subscription/t/030_origin.pl index eb1dcb7271..390623d2da 100644 --- a/src/test/subscription/t/030_origin.pl +++ b/src/test/subscription/t/030_origin.pl @@ -255,6 +255,110 @@ $node_A->safe_psql('postgres', "DROP TABLE tab_new"); $node_B->safe_psql('postgres', "DROP TABLE tab_new"); $node_A->safe_psql('postgres', "DROP SUBSCRIPTION $subname_AB2"); +############################################################################### +# Specifying origin = NONE and copy_data = on must raise WARNING if we subscribe +# to a partitioned table and this table contains any remotely originated data. +# +# node_B +# __________________________ +# | tab_main | --------------> node_C (tab_main) +# |__________________________| +# | tab_part1 | tab_part2 | <-------------- node_A (tab_part2) +# |____________|_____________| +# | tab_part2_1 | +# |_____________| +# +# node_B +# __________________________ +# | tab_main | +# |__________________________| +# | tab_part1 | tab_part2 | <-------------- node_A (tab_part2) +# |____________|_____________| +# | tab_part2_1 | --------------> node_C (tab_part2_1) +# |_____________| +############################################################################### + +# create a table on node A which will act as a source for a partition on node B +$node_A->safe_psql( + 'postgres', qq( +CREATE TABLE tab_part2(a int); +CREATE PUBLICATION pub_a_b FOR TABLE tab_part2; +)); + +# create a partition table on node B +$node_B->safe_psql( + 'postgres', qq( +CREATE TABLE tab_main(a int) PARTITION BY RANGE(a); +CREATE TABLE tab_part1 PARTITION OF tab_main FOR VALUES FROM (0) TO (5); +CREATE TABLE tab_part2(a int) PARTITION BY RANGE(a); +CREATE TABLE tab_part2_1 PARTITION OF tab_part2 FOR VALUES FROM (5) TO (10); +ALTER TABLE tab_main ATTACH PARTITION tab_part2 FOR VALUES FROM (5) to (10); +CREATE SUBSCRIPTION sub_a_b CONNECTION '$node_A_connstr' PUBLICATION pub_a_b; +)); + +# create a table on node C +$node_C->safe_psql( + 'postgres', qq( +CREATE TABLE tab_main(a int); +CREATE TABLE tab_part2_1(a int); +)); + +# create a logical replication setup between node B and node C with +# subscription on node C having origin = NONE and copy_data = on +$node_B->safe_psql( + 'postgres', qq( +CREATE PUBLICATION pub_b_c FOR TABLE tab_main WITH (publish_via_partition_root); +CREATE PUBLICATION pub_b_c_2 FOR TABLE tab_part2_1; +)); + +($result, $stdout, $stderr) = $node_C->psql( + 'postgres', " + CREATE SUBSCRIPTION sub_b_c CONNECTION '$node_B_connstr' PUBLICATION pub_b_c WITH (origin = none, copy_data = on); +"); + +# A warning must be logged as a partition 'tab_part2' in node B is subscribed to +# node A so partition 'tab_part2' can have remotely originated data +like( + $stderr, + qr/WARNING: ( [A-Z0-9]+:)? subscription "sub_b_c" requested copy_data with origin = NONE but might copy data that had a different origin/, + "Create subscription with origin = none and copy_data when the publisher's partition is subscribing from different origin" +); +$node_C->safe_psql('postgres', "DROP SUBSCRIPTION sub_b_c"); + +($result, $stdout, $stderr) = $node_C->psql( + 'postgres', " + CREATE SUBSCRIPTION sub_b_c CONNECTION '$node_B_connstr' PUBLICATION pub_b_c_2 WITH (origin = none, copy_data = on); +"); + +# A warning must be logged as ancestor of table 'tab_part2_1' in node A have +# been subscribed to node B so table 'tab_part2_1' can have remotely originated +# data +like( + $stderr, + qr/WARNING: ( [A-Z0-9]+:)? subscription "sub_b_c" requested copy_data with origin = NONE but might copy data that had a different origin/, + "Create subscription with origin = none and copy_data when the publisher's ancestor is subscribing from different origin" +); + +# clear the operations done by this test +$node_C->safe_psql( + 'postgres', qq( +DROP SUBSCRIPTION sub_b_c; +DROP TABLE tab_main; +DROP TABLE tab_part2_1; +)); +$node_B->safe_psql( + 'postgres', qq( +DROP SUBSCRIPTION sub_a_b; +DROP PUBLICATION pub_b_c; +DROP PUBLICATION pub_b_c_2; +DROP TABLE tab_main; +)); +$node_A->safe_psql( + 'postgres', qq( +DROP PUBLICATION pub_a_b; +DROP TABLE tab_part2; +)); + # shutdown $node_B->stop('fast'); $node_A->stop('fast'); -- 2.34.1