Here are some review comment for the HEAD_v8 patch: ======
1. Commit message If there are two publications, one of them publish a parent table with (publish_via_partition_root = true) and another publish child table, subscribing to both publications from one subscription results in two initial replications. It should only be copied once. SUGGESTION (Note**) If there are two publications - one of them publishing a parent table (using publish_via_partition_root = true) and the other is publishing one of the parent's child tables - then subscribing to both publications from one subscription results in the same initial child data being copied twice. It should only be copied once. Note** - I've only reworded the original commit message slightly but otherwise left it saying the same thing. But I still have doubts if this message actually covers all the cases the patch is trying to address. e.g. The comment (see below) in the 'fetch_table_list' function seemed to cover more cases than what this commit message is describing. /* * Get the list of tables from publisher, the partition table whose * ancestor is also in this list will be ignored, otherwise the initial * data in the partition table would be replicated twice. */ ====== 2. src/backend/catalog/pg_publication.c - pg_get_publication_tables 2a. /* - * Returns information of tables in a publication. + * Returns information of the tables in the given publication array. */ What does "information of the tables" actually mean? Is it just the names of the tables; is it more than that? IMO a longer, more explanatory comment will be better here instead of a brief ambiguous one. 2b. + *results = NIL; This variable name 'results' is too generic, so it is not helpful when trying to understand the subsequent code logic. Please give this a meaningful name/description. 2c. /* Deconstruct the parameter into elements. */ SUGGESTION Deconstruct the parameter into elements where each element is a publication name. 2d. + List *current_tables = NIL; I think this is the tables only on the current pub_elem, so I thought 'pub_elem_tables' might make it easier to understand this list's meaning. 2e. + /* Now sort and de-duplicate the result list */ + list_sort(tables, list_oid_cmp); + list_deduplicate_oid(tables); IMO this comment is confusing because there is another list that is called the 'results' list, but that is not the same list you are processing here. Also, it does not really add anything helpful to just have comments saying almost the same as the function names (sort/de-duplicate), so please give longer comments to say the reason *why* the logic does this rather than just describing the steps. 2f. + /* Filter by final published table */ + foreach(lc, results) Please make this comment more descriptive to explain why/what the logic is doing. ====== 3. src/backend/commands/subscriptioncmds.c - fetch_table_list 3a. + bool check_columnlist = (server_version >= 150000); Given the assignment, maybe 'columnlist_supported' is a better name? 3b. + /* Get information of the tables belonging to the specified publications */ For "Get information of..." can you elaborate *what* table information this is getting and why? 3c. + if (server_version >= 160000) + appendStringInfo(&cmd, "SELECT DISTINCT N.nspname, C.relname,\n" + " ( CASE WHEN (array_length(GPT.attrs, 1) = C.relnatts)\n" + " THEN NULL ELSE GPT.attrs END\n" + " ) AS attnames\n" + " FROM pg_class C\n" + " JOIN pg_namespace N ON N.oid = C.relnamespace\n" + " JOIN ( SELECT (pg_get_publication_tables(VARIADIC array_agg(pubname::text))).*\n" + " FROM pg_publication\n" + " WHERE pubname IN ( %s )) as GPT\n" + " ON GPT.relid = C.oid\n", + pub_names.data); AFAICT the main reason for this check was to decide if you can use the new version of 'pg_get_publication_tables' that supports the VARIADIC array of pub names or not. If that is correct, then maybe the comment should describe that reason, or maybe add another bool var similar to the 'check_columnlist' one for this. ------ Kind Regards, Peter Smith. Fujitsu Australia