RE: Data is copied twice when specifying both child and parent table in publication

wangw.f...@fujitsu.com Fri, 05 Aug 2022 02:07:19 -0700

On Thur, Jul 28, 2022 at 17:17 PM Peter Smith <smithpb2...@gmail.com> wrote:
> Here are some review comments for the HEAD_v7-0001 patch:


Thanks for your comments.

> 2. Commit message.
> 
> 2a.
> 
> If there are two publications that publish the parent table and the child 
> table
> separately, and both specify the option publish_via_partition_root, 
> subscribing
> to both publications from one subscription causes initial copy twice.
> 
> SUGGESTION
> If there are two publications that publish the parent table and the child 
> table
> respectively, but both specify publish_via_partition_root = true, subscribing
> to both publications from one subscription causes initial copy twice.
> 
> 2b. <General>
> 
> Actually, isn't it more subtle than what that comment is describing?
> Maybe nobody is explicitly publishing a parent table at all. Maybe
> pub1 publishes partition1 and pub2 publishes partition2, but both
> publications are using publish_via_partition_root = true. Is this
> scenario even tested? Does the logic of pg_get_publication_tables
> cover this scenario?

=>2a.
Okay, changed it as suggested.

=>2b.
This is not the case we are trying to fix. The problematic scenario is when the
a parent table is published via root partitioned table and in this case we need
to ignore other partitions. And I try to improve the commit message to make it
clear.

> 4.
> 
> + /* Filter by final published table. */
> + foreach(lc, results)
> + {
> + Oid *table_info = (Oid *) lfirst(lc);
> +
> + if (!list_member_oid(tables, table_info[0]))
> + results = foreach_delete_current(results, lc);
>   }
> 
> The comment did not convey enough meaning. Can you make it more
> descriptive to explain why/what the logic is doing here?

I think the comments above `tables = filter_partitions(tables);` explain this.

> 5. src/backend/commands/subscriptioncmds.c - fetch_table_list
> 
>   /* Get column lists for each relation if the publisher supports it */
> - if (check_columnlist)
> - appendStringInfoString(&cmd, ", t.attnames\n");
> + if (server_version >= 160000)
> + appendStringInfo(&cmd, "SELECT DISTINCT n.nspname, c.relname,\n"
> 
> That comment is exactly the same as it was before the patch. But it
> doesn't seem quite appropriate anymore for this new condition and this
> new query.

Improved the comments as following:
```
Get information of the tables belonging to the specified publications
```

The rest of the comments are improved as suggested.
I also rebased the patch based on the commit (0c20dd3) on HEAD, and made some
changes to the back-branch patches based on some of Peter's comments.

Attach the new patches.

Regards,
Wang wei

HEAD_v8-0001-Fix-data-replicated-twice-when-specifying-publish.patch
Description: HEAD_v8-0001-Fix-data-replicated-twice-when-specifying-publish.patch

REL15_v8-0001-Fix-data-replicated-twice-when-specifying-publish_patch
Description: REL15_v8-0001-Fix-data-replicated-twice-when-specifying-publish_patch

REL14_v8-0001-Fix-data-replicated-twice-when-specifying-publish_patch
Description: REL14_v8-0001-Fix-data-replicated-twice-when-specifying-publish_patch

REL13_v8-0001-Fix-data-replicated-twice-when-specifying-publish_patch
Description: REL13_v8-0001-Fix-data-replicated-twice-when-specifying-publish_patch

RE: Data is copied twice when specifying both child and parent table in publication

Reply via email to