Here are some review comments for the HEAD_v7-0001 patch: ======
1. <General> I have a fundamental question about this patch. IIUC the purpose of this patch is to ensure that (when publish_via_root = true) the copy of the partition data will happen only once (e.g. from one parent table on one of the publishers). But I think there is no guarantee that those 2 publishers even had the same data, right? Therefore it seems to me you could get different results if the data were copied from pub1 or from pub2. (I have not tried it - this is just my suspicion). Am I correct or mistaken? If correct, then it means there is a big (but subtle) difference related to the ordering of processing ... A) is this explicitly documented so the user knows what data to expect? B) is the effect of different ordering tested anywhere? Actually, I have no idea what exactly determines the order – is it the original CREATE SUBSCRIPTION publication list order? Is it the logic of the pg_get_publication_tables function? Is it the SQL in function fetch_table_list? Or is it not deterministic at all? Please confirm it. ====== 2. Commit message. 2a. If there are two publications that publish the parent table and the child table separately, and both specify the option publish_via_partition_root, subscribing to both publications from one subscription causes initial copy twice. SUGGESTION If there are two publications that publish the parent table and the child table respectively, but both specify publish_via_partition_root = true, subscribing to both publications from one subscription causes initial copy twice. 2b. <General> Actually, isn't it more subtle than what that comment is describing? Maybe nobody is explicitly publishing a parent table at all. Maybe pub1 publishes partition1 and pub2 publishes partition2, but both publications are using publish_via_partition_root = true. Is this scenario even tested? Does the logic of pg_get_publication_tables cover this scenario? ====== 3. src/backend/catalog/pg_publication.c - pg_get_publication_tables pg_get_publication_tables(PG_FUNCTION_ARGS) { #define NUM_PUBLICATION_TABLES_ELEM 3 - FuncCallContext *funcctx; - char *pubname = text_to_cstring(PG_GETARG_TEXT_PP(0)); - Publication *publication; - List *tables; + FuncCallContext *funcctx; + Publication *publication; Something seems strange about having a common Publication declaration. Firstly it is used to represent every publication element in the array loop. Later, it is overwritten to represent a single publication. I think it might be easier if you declare these variables separately: E.g.1 Publication *pub_elem; -- for the array element processing declared within the for loop E.g.2 Publication *pub; -- declared within if (funcctx->call_cntr < list_length(results)) ~~~ 4. + /* Filter by final published table. */ + foreach(lc, results) + { + Oid *table_info = (Oid *) lfirst(lc); + + if (!list_member_oid(tables, table_info[0])) + results = foreach_delete_current(results, lc); } The comment did not convey enough meaning. Can you make it more descriptive to explain why/what the logic is doing here? ====== 5. src/backend/commands/subscriptioncmds.c - fetch_table_list /* Get column lists for each relation if the publisher supports it */ - if (check_columnlist) - appendStringInfoString(&cmd, ", t.attnames\n"); + if (server_version >= 160000) + appendStringInfo(&cmd, "SELECT DISTINCT n.nspname, c.relname,\n" That comment is exactly the same as it was before the patch. But it doesn't seem quite appropriate anymore for this new condition and this new query. ~~~ 6. + /* + * Get the list of tables from publisher, the partition table whose + * ancestor is also in this list should be ignored, otherwise the + * initial data in the partition table would be replicated twice. + */ Why say "should be ignored" -- don’t you mean "will be" or "must be" or "is". ~~~ 7. initStringInfo(&cmd); - appendStringInfoString(&cmd, "SELECT DISTINCT t.schemaname, t.tablename \n"); /* Get column lists for each relation if the publisher supports it */ - if (check_columnlist) - appendStringInfoString(&cmd, ", t.attnames\n"); + if (server_version >= 160000) + appendStringInfo(&cmd, "SELECT DISTINCT n.nspname, c.relname,\n" + " ( CASE WHEN (array_length(gpt.attrs, 1) = c.relnatts)\n" + " THEN NULL ELSE gpt.attrs END\n" + " ) AS attnames\n" + " FROM pg_class c\n" + " JOIN pg_namespace n ON n.oid = c.relnamespace\n" + " JOIN ( SELECT (pg_get_publication_tables(VARIADIC array_agg(pubname::text))).*\n" + " FROM pg_publication\n" + " WHERE pubname IN ( %s )) as gpt\n" + " ON gpt.relid = c.oid\n", + pub_names.data); + else + { + /* + * Get the list of tables from publisher, the partition table whose + * ancestor is also in this list should be ignored, otherwise the + * initial data in the partition table would be replicated twice. + */ - appendStringInfoString(&cmd, "FROM pg_catalog.pg_publication_tables t\n" - " WHERE t.pubname IN ("); - get_publications_str(publications, &cmd, true); - appendStringInfoChar(&cmd, ')'); + appendStringInfoString(&cmd, "WITH pub_tabs AS(\n" + " SELECT DISTINCT N.nspname, C.oid, C.relname, C.relispartition\n"); + + /* Get column lists for each relation if the publisher supports it */ + if (check_columnlist) + appendStringInfoString(&cmd, ",( CASE WHEN (array_length(gpt.attrs, 1) = c.relnatts)\n" + " THEN NULL ELSE gpt.attrs END\n" + " ) AS attnames\n"); + + appendStringInfo(&cmd, " FROM pg_publication P,\n" + " LATERAL pg_get_publication_tables(P.pubname) GPT,\n" + " pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)\n" + " WHERE C.oid = GPT.relid AND P.pubname IN ( %s )\n" + ")\n" + "SELECT DISTINCT pub_tabs.nspname, pub_tabs.relname\n", + pub_names.data); + + /* Get column lists for each relation if the publisher supports it */ + if (check_columnlist) + appendStringInfoString(&cmd, ", pub_tabs.attnames\n"); + + appendStringInfoString(&cmd, "FROM pub_tabs\n" + " WHERE (pub_tabs.relispartition IS FALSE\n" + " OR NOT EXISTS (SELECT 1 FROM pg_partition_ancestors(pub_tabs.oid) as pa\n" + " WHERE pa.relid IN (SELECT pub_tabs.oid FROM pub_tabs)\n" + " AND pa.relid != pub_tabs.oid))\n"); + } Please use a consistent case for all the SQL aliases. E.g "gpt" versus "GPT", "c" versus "C", etc. ====== 8. src/test/subscription/t/013_partition.pl +# Note: We only create one table for the partitioned table (tab4) here. Because +# we specify option "publish_via_partition_root" (see pub_all and +# pub_lower_level above), all data should be replicated to the partitioned +# table. So we do not need to create table for the partition table. "replicated to the partitioned table" ?? The entire comment seems a bit misleading because how can we call the subscriber table a "partitioned" table when it has no partitions?! SUGGESTION (maybe?) Note: We only create one table (tab4) here. We specified publish_via_partition_root = true (see pub_all and pub_lower_level above), so all data will be replicated to that table. ------ Kind Regards, Peter Smith. Fujitsu Australia