Below are some review comments for the patch v18-0003 1. Commit message
This patch does a couple of things: change 1) Checks and throws an error if the publication tables were also subscribing data in the publisher from other publishers when copy_data parameter is specified as 'ON' and origin parameter is specified as 'local'. change 2) Adds force value for copy_data parameter. SUGGESTION This patch does a couple of things: change 1) Checks and throws an error if 'copy_data = on' and 'origin = local' but the publication tables were also subscribing data in the publisher from other publishers. change 2) Adds 'force' value for copy_data parameter. ~~~ 2. Commit message - about the example All my following review comments for the commit message are assuming that the example steps are as they are written in the patch, but actually I felt that the example might be more complex than it needed to be: e.g - You don’t really need the node2 to have data - Therefore you don’t need all the added TRUNCATE complications E.g. I think you only need node1 (with data) and node2 (no data). Then node1 subscribes node2 with (origin=local, copy_data=off). Then node2 subscribes node1 with (origin=local, copy_data=on). - Demonstrates exception happens because node1 already had a subscription - Demonstrates need for the copy_data=force to override that exception So please consider using a simpler example for this commit message ~~~ 3. Commit message The below help us understand how the first change will be useful: If copy_data parameter was used with 'on' in step 5, then an error will be thrown to alert the user to prevent inconsistent data being populated: CREATE SUBSCRIPTION sub_node2_node1 CONNECTION '<node1 details>' PUBLICATION pub_node1 WITH (copy_data = on, origin = local); ERROR: CREATE/ALTER SUBSCRIPTION with origin and copy_data as true is not allowed when the publisher might have replicated data SUGGESTION The steps below help to demonstrate how the new exception is useful: The initial copy phase has no way to know the origin of the row data, so if 'copy_data = on' in the step 5 below, then an error will be thrown to prevent any potentially non-local data from being copied: <blank line> e.g CREATE SUBSCRIPTION ... ~~~ 4. Commit message The following will help us understand how the second change will be useful: Let's take a simple case where user is trying to setup bidirectional logical replication between node1 and node2 where the two nodes have some pre-existing data like below: node1: Table t1 (c1 int) has data 11, 12, 13, 14 node2: Table t1 (c1 int) has data 21, 22, 23, 24 SUGGESTION The following steps help to demonstrate how the 'copy_data = force' change will be useful: <blank line> Let's take a scenario where the user wants to set up bidirectional logical replication between node1 and node2 where the same table on each node has pre-existing data. e.g. node1: Table t1 (c1 int) has data 11, 12, 13, 14 node2: Table t1 (c1 int) has data 21, 22, 23, 24 ~~~ 5. Commit message step 4: node2=# CREATE SUBSCRIPTION sub_node2_node1 Connection '<node1 details>' node2-# PUBLICATION pub_node1; CREATE SUBSCRIPTION "Connection" => "CONNECTION" ~~~ 6. Commit message If table t1 has a unique key, it will cause a unique key violation and replication won't proceed. SUGGESTION In case, table t1 has a unique key, it will lead to a unique key violation and replication won't proceed. ~~~ 7. Commit message step 3: Create a subscription in node1 to subscribe to node2. Use copy_data specified as on so that the existing table data is copied during initial sync: SUGGESTION step 3: Create a subscription in node1 to subscribe to node2. Use 'copy_data = on' so that the existing table data is copied during initial sync: ~~~ 8. Commit message step 4: Adjust the publication publish settings so that truncate is not published to the subscribers and truncate the table data in node2: SUGGESTION (only added a comma) step 4: Adjust the publication publish settings so that truncate is not published to the subscribers, and truncate the table data in node2: ~~~ 9. Commit message step 5: Create a subscription in node2 to subscribe to node1. Use copy_data specified as force when creating a subscription to node1 so that the existing table data is copied during initial sync: SUGGESTION step 5: Create a subscription in node2 to subscribe to node1. Use 'copy_data = force' when creating a subscription to node1 so that the existing table data is copied during initial sync: ====== 10. doc/src/sgml/ref/create_subscription.sgml @@ -383,6 +398,15 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl can have non-existent publications. </para> + <para> + If subscription is created with <literal>origin = local</literal> and + <literal>copy_data = on</literal>, it will check if the publisher tables are + being subscribed to any other publisher and throw an error to prevent + inconsistent data in the subscription. The user can continue with the copy + operation without throwing any error in this case by specifying + <literal>copy_data = force</literal>. + </para> SUGGESTION (minor rewording) If the subscription is created with <literal>origin = local</literal> and <literal>copy_data = on</literal>, it will check if the publisher tables are being subscribed to any other publisher and, if so, then throw an error to prevent possible non-local data from being copied. The user can override this check and continue with the copy operation by specifying <literal>copy_data = force</literal>. ====== 11. src/backend/commands/subscriptioncmds.c - parse_subscription_options >From [1]: >> What about also allowing copy_data = 2, and making it equivalent to "force"? > Vignesh: I felt the existing looks ok, no need to support 2. It might confuse > the user. I don't think it would be confusing, but I also don't feel strongly enough to debate it. Anyway, I could not find a similar precedent, so your decision is fine. ~~~ 12. src/backend/commands/subscriptioncmds.c - parse_subscription_options @@ -339,17 +406,16 @@ parse_subscription_options(ParseState *pstate, List *stmt_options, errmsg("%s and %s are mutually exclusive options", "connect = false", "create_slot = true"))); - if (opts->copy_data && - IsSet(opts->specified_opts, SUBOPT_COPY_DATA)) + if (opts->copy_data && IsSet(opts->specified_opts, SUBOPT_COPY_DATA)) ereport(ERROR, This is just a formatting change. Is it needed for this patch? patch. ~~~ 13. src/backend/commands/subscriptioncmds.c - AlterSubscription_refresh @@ -730,7 +798,8 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt, * PENDING, to allow ALTER SUBSCRIPTION ... REFRESH * PUBLICATION to work. */ - if (opts.twophase && !opts.copy_data && tables != NIL) + if (opts.twophase && opts.copy_data == COPY_DATA_OFF && + tables != NIL) twophase_enabled = true; Why is this change needed? I thought the original code is OK now since COPY_DATA_OFF = 0 ~~~ 14. src/backend/commands/subscriptioncmds.c - AlterSubscription @@ -1265,7 +1337,8 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt, * * For more details see comments atop worker.c. */ - if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && opts.copy_data) + if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && + opts.copy_data) This is just a formatting change. Is it needed for this patch? ~~~ 15. src/backend/commands/subscriptioncmds.c - check_pub_table_subscribed + + if (!origin || (strcmp(origin, "local") != 0) || copydata != COPY_DATA_ON) + return; + This condition could be rearranged to put the strcmp last so it is not called unless absolutely necessary. ~~~ 16. src/backend/commands/subscriptioncmds.c - check_pub_table_subscribed + appendStringInfoString(&cmd, + "SELECT DISTINCT N.nspname AS schemaname, C.relname AS tablename, PS.srrelid as replicated\n" + "FROM pg_publication P,\n" The line is too long; needs wrapping. ~~~ 17. src/backend/commands/subscriptioncmds.c - check_pub_table_subscribed + if (!slot_attisnull(slot, 3)) + ereport(ERROR, + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("table:%s.%s might have replicated data in the publisher", + nspname, relname), + errdetail("CREATE/ALTER SUBSCRIPTION with origin as local and copy_data as true is not allowed when the publisher might have replicated data."), + errhint("Use CREATE/ALTER SUBSCRIPTION with copy_data = off or force.")); I felt the errmsg may be easier to read using "=" instead of "as". Anyway, it would be more consistent with the errhint. Also, change the "true" to "on" to be consistent with the errhint. SUGGESTION errdetail("CREATE/ALTER SUBSCRIPTION with origin = local and copy_data = on is not allowed when the publisher might have replicated data."), ------ [1] https://www.postgresql.org/message-id/CALDaNm3iLLxP4OV%2ByQHs-c71P6zQ9W8D30DGsve1SQs_1pFsSQ%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia