On Tues, Aug 9, 2022 at 15:15 PM Peter Smith <smithpb2...@gmail.com> wrote: > Here are some review comment for the HEAD_v8 patch:
Thanks for your comments. > 1. Commit message > > If there are two publications, one of them publish a parent table with > (publish_via_partition_root = true) and another publish child table, > subscribing to both publications from one subscription results in two initial > replications. It should only be copied once. > > SUGGESTION (Note**) > If there are two publications - one of them publishing a parent table > (using publish_via_partition_root = true) and the other is publishing > one of the parent's child tables - then subscribing to both > publications from one subscription results in the same initial child > data being copied twice. It should only be copied once. > > > Note** - I've only reworded the original commit message slightly but > otherwise left it saying the same thing. But I still have doubts if > this message actually covers all the cases the patch is trying to > address. e.g. The comment (see below) in the 'fetch_table_list' > function seemed to cover more cases than what this commit message is > describing. > /* > * Get the list of tables from publisher, the partition table whose > * ancestor is also in this list will be ignored, otherwise the initial > * data in the partition table would be replicated twice. > */ => commit message Changed. => Note** I think the commit message and the comment you mentioned refer to the same kind of scenario. > 2. src/backend/catalog/pg_publication.c - pg_get_publication_tables > > 2a. > /* > - * Returns information of tables in a publication. > + * Returns information of the tables in the given publication array. > */ > > What does "information of the tables" actually mean? Is it just the > names of the tables; is it more than that? IMO a longer, more > explanatory comment will be better here instead of a brief ambiguous > one. Changed as below: ``` Get information of the tables in the given publication array. Returns the oid, column list, row filter for each table. ``` > 2b. > + *results = NIL; > > This variable name 'results' is too generic, so it is not helpful when > trying to understand the subsequent code logic. Please give this a > meaningful name/description. Changed the variable name as below: results -> table_infos > 2c. > /* Deconstruct the parameter into elements. */ > > SUGGESTION > Deconstruct the parameter into elements where each element is a > publication name. Changed. > 2d. > + List *current_tables = NIL; > > I think this is the tables only on the current pub_elem, so I thought > 'pub_elem_tables' might make it easier to understand this list's > meaning. Changed. > 2e. > + /* Now sort and de-duplicate the result list */ > + list_sort(tables, list_oid_cmp); > + list_deduplicate_oid(tables); > > IMO this comment is confusing because there is another list that is > called the 'results' list, but that is not the same list you are > processing here. Also, it does not really add anything helpful to just > have comments saying almost the same as the function names > (sort/de-duplicate), so please give longer comments to say the reason > *why* the logic does this rather than just describing the steps. Fixed the comment. ("result" -> "tables") I think the purpose of these two functions is clear. The reason I added the comment here is for consistency with other calling locations. > 2f. > + /* Filter by final published table */ > + foreach(lc, results) > > Please make this comment more descriptive to explain why/what the > logic is doing. Changed as below: ``` For tables that have been filtered out, delete the corresponding table information in the table_infos list. ``` > 3. src/backend/commands/subscriptioncmds.c - fetch_table_list > > 3a. > + bool check_columnlist = (server_version >= 150000); > > Given the assignment, maybe 'columnlist_supported' is a better name? I am not sure if this name could be changed in this thread. > 3b. > + /* Get information of the tables belonging to the specified publications */ > > For "Get information of..." can you elaborate *what* table > information this is getting and why? I am not sure if we need to add a reason. So, I only added what information we are going to get: ``` Get namespace, relname and column list (if supported) of the tables belonging to the specified publications. ``` > 3c. > + if (server_version >= 160000) > + appendStringInfo(&cmd, "SELECT DISTINCT N.nspname, C.relname,\n" > + " ( CASE WHEN (array_length(GPT.attrs, 1) = C.relnatts)\n" > + " THEN NULL ELSE GPT.attrs END\n" > + " ) AS attnames\n" > + " FROM pg_class C\n" > + " JOIN pg_namespace N ON N.oid = C.relnamespace\n" > + " JOIN ( SELECT (pg_get_publication_tables(VARIADIC > array_agg(pubname::text))).*\n" > + " FROM pg_publication\n" > + " WHERE pubname IN ( %s )) as GPT\n" > + " ON GPT.relid = C.oid\n", > + pub_names.data); > > AFAICT the main reason for this check was to decide if you can use the > new version of 'pg_get_publication_tables' that supports the VARIADIC > array of pub names or not. If that is correct, then maybe the comment > should describe that reason, or maybe add another bool var similar to > the 'check_columnlist' one for this. I added the comment as following: ``` From version 16, the parameter of the function pg_get_publication_tables can be an array of publications. The partition table whose ancestor is also published in this publication array will be filtered out in this function. ``` I also rebased the REL_15 patch based on the commit (15014b8), and made some changes to the back-branch patches based on Peter's suggestions. Attach the new patches. Regards, Wang wei
HEAD_v9-0001-Fix-data-replicated-twice-when-specifying-publish.patch
Description: HEAD_v9-0001-Fix-data-replicated-twice-when-specifying-publish.patch
REL15_v9-0001-Fix-data-replicated-twice-when-specifying-publish_patch
Description: REL15_v9-0001-Fix-data-replicated-twice-when-specifying-publish_patch
REL14_v9-0001-Fix-data-replicated-twice-when-specifying-publish_patch
Description: REL14_v9-0001-Fix-data-replicated-twice-when-specifying-publish_patch
REL13_v9-0001-Fix-data-replicated-twice-when-specifying-publish_patch
Description: REL13_v9-0001-Fix-data-replicated-twice-when-specifying-publish_patch