Here are some review comments for the v17-0001 patch. ~~~
1. Commit message If no column list is specified, all the columns are replicated, as previously Missing period (.) at the end of that sentence. ~~~ 2. doc/src/sgml/catalogs.sgml + <para> + This is an array of values that indicates which table columns are + part of the publication. For example a value of <literal>1 3</literal> + would mean that the first and the third table columns are published. + A null value indicates that all attributes are published. + </para></entry> Missing comma: "For example" --> "For example," Terms: The text seems to jump between "columns" and "attributes". Perhaps, for consistency, that last sentence should say: "A null value indicates that all columns are published." ~~~ 3. doc/src/sgml/protocol.sgml </variablelist> - Next, the following message part appears for each column (except generated columns): + Next, the following message part appears for each column (except + generated columns and other columns that don't appear in the column + filter list, for tables that have one): <variablelist> Perhaps that can be expressed more simply, like: Next, the following message part appears for each column (except generated columns and other columns not present in the optional column filter list): ~~~ 4. doc/src/sgml/ref/alter_publication.sgml +ALTER PUBLICATION <replaceable class="parameter">name</replaceable> ALTER TABLE <replaceable class="parameter">publication_object</replaceable> SET COLUMNS { ( <replaceable class="parameter">name</replaceable> [, ...] ) | ALL } The syntax chart looks strange because there is already a "TABLE" and a column_name list within the "publication_object" definition, so do ALTER TABLE and publication_object co-exist? According to the current documentation it suggests nonsense like below is valid: ALTER PUBLICATION mypublication ALTER TABLE TABLE t1 (a,b,c) SET COLUMNS (a,b,c); -- But more fundamentally, I don't see why any new syntax is even needed at all. Instead of: ALTER PUBLICATION mypublication ALTER TABLE users SET COLUMNS (user_id, firstname, lastname); Why not just: ALTER PUBLICATION mypublication ALTER TABLE users (user_id, firstname, lastname); Then, if the altered table defines a *different* column list then it would be functionally equivalent to whatever your SET COLUMNS is doing now. AFAIK this is how the Row-Filter [1] works, so that altering an existing table to have a different Row-Filter just overwrites that table's filter. IMO the Col-Filter behaviour should work the same as that - "SET COLUMNS" is redundant. ~~~ 5. doc/src/sgml/ref/alter_publication.sgml - TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [, ... ] + TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ ( <replaceable class="parameter">column_name</replaceable>, [, ... ] ) ] [, ... ] That extra comma after the "column_name" seems wrong because there is one already in "[, ... ]". ~~~ 6. doc/src/sgml/ref/create_publication.sgml - TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [, ... ] + TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ ( <replaceable class="parameter">column_name</replaceable>, [, ... ] ) ] [, ... ] (Same as comment #5). That extra comma after the "column_name" seems wrong because there is one already in "[, ... ]". ~~~ 7. doc/src/sgml/ref/create_publication.sgml + <para> + When a column list is specified, only the listed columns are replicated; + any other columns are ignored for the purpose of replication through + this publication. If no column list is specified, all columns of the + table are replicated through this publication, including any columns + added later. If a column list is specified, it must include the replica + identity columns. + </para> Suggest to re-word this a bit simpler: e.g. - "listed columns" --> "named columns" - I don't think it is necessary to say the unlisted columns are ignored. - I didn't think it is necessary to say "though this publication" AFTER When a column list is specified, only the named columns are replicated. If no column list is specified, all columns of the table are replicated, including any columns added later. If a column list is specified, it must include the replica identity columns. ~~~ 8. doc/src/sgml/ref/create_publication.sgml Consider adding another example showing a CREATE PUBLICATION which has a column list. ~~~ 9. src/backend/catalog/pg_publication.c - check_publication_add_relation /* - * Check if relation can be in given publication and throws appropriate - * error if not. + * Check if relation can be in given publication and that the column + * filter is sensible, and throws appropriate error if not. + * + * targetcols is the bitmapset of attribute numbers given in the column list, + * or NULL if it was not specified. */ Typo: "targetcols" --> "columns" ?? ~~~ 10. src/backend/catalog/pg_publication.c - check_publication_add_relation + + /* Make sure the column list checks out */ + if (columns != NULL) + { Perhaps "checks out" could be worded better. ~~~ 11. src/backend/catalog/pg_publication.c - check_publication_add_relation + /* Make sure the column list checks out */ + if (columns != NULL) + { + /* + * Even if the user listed all columns in the column list, we cannot + * allow a column list to be specified when REPLICA IDENTITY is FULL; + * that would cause problems if a new column is added later, because + * the new column would have to be included (because of being part of + * the replica identity) but it's technically not allowed (because of + * not being in the publication's column list yet). So reject this + * case altogether. + */ + if (replidentfull) + ereport(ERROR, + errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("invalid column list for publishing relation \"%s\"", + RelationGetRelationName(targetrel)), + errdetail("Cannot specify a column list on relations with REPLICA IDENTITY FULL.")); + + check_publication_columns(pub, targetrel, columns); + } IIUC almost all of the above comment and code is redundant because by calling the check_publication_columns function it will do exactly the same check... So, that entire slab might be replaced by 2 lines: if (columns != NULL) check_publication_columns(pub, targetrel, columns); ~~~ 12. src/backend/catalog/pg_publication.c - publication_set_table_columns +publication_set_table_columns(Relation pubrel, HeapTuple pubreltup, + Relation targetrel, List *columns) +{ + Bitmapset *attset; + AttrNumber *attarray; + HeapTuple copytup; + int natts; + bool nulls[Natts_pg_publication_rel]; + bool replaces[Natts_pg_publication_rel]; + Datum values[Natts_pg_publication_rel]; + + memset(values, 0, sizeof(values)); + memset(nulls, 0, sizeof(nulls)); + memset(replaces, false, sizeof(replaces)); It seemed curious to use memset false for "replaces" but memset 0 for "nulls", since they are both bool arrays (??) ~~~ 13. src/backend/catalog/pg_publication.c - compare_int16 +/* qsort comparator for attnums */ +static int +compare_int16(const void *a, const void *b) +{ + int av = *(const int16 *) a; + int bv = *(const int16 *) b; + + /* this can't overflow if int is wider than int16 */ + return (av - bv); +} This comparator seems common with another one already in the PG source. Perhaps it would be better for generic comparators (like this one) to be in some common code instead of scattered cut/paste copies of the same thing. ~~~ 14. src/backend/commands/publicationcmds.c - AlterPublicationTables + else if (stmt->action == AP_SetColumns) + { + Assert(schemaidlist == NIL); + Assert(list_length(tables) == 1); + + PublicationSetColumns(stmt, pubform, + linitial_node(PublicationTable, tables)); + } (Same as my earlier review comment #4) Suggest to call this PublicationSetColumns based on some smarter detection logic of a changed column list. Please refer to the Row-Filter patch [1] for this same function. ~~~ 15. src/backend/commands/publicationcmds.c - AlterPublicationTables + /* This is not needed to delete a table */ + pubrel->columns = NIL; Perhaps a more explanatory comment would be better there? ~~~ 16. src/backend/commands/tablecmds.c - relation_mark_replica_identity @@ -15841,6 +15871,7 @@ relation_mark_replica_identity(Relation rel, char ri_type, Oid indexOid, CatalogTupleUpdate(pg_index, &pg_index_tuple->t_self, pg_index_tuple); InvokeObjectPostAlterHookArg(IndexRelationId, thisIndexOid, 0, InvalidOid, is_internal); + /* * Invalidate the relcache for the table, so that after we commit * all sessions will refresh the table's replica identity index Spurious whitespace change seemed unrelated to the Col-Filter patch. ~~~ 17. src/backend/parser/gram.y * + * ALTER PUBLICATION name SET COLUMNS table_name (column[, ...]) + * ALTER PUBLICATION name SET COLUMNS table_name ALL + * (Same as my earlier review comment #4) IMO there was no need for the new syntax of SET COLUMNS. ~~~ 18. src/backend/replication/logical/proto.c - logicalrep_write_attrs - /* send number of live attributes */ - for (i = 0; i < desc->natts; i++) - { - if (TupleDescAttr(desc, i)->attisdropped || TupleDescAttr(desc, i)->attgenerated) - continue; - nliveatts++; - } - pq_sendint16(out, nliveatts); - /* fetch bitmap of REPLICATION IDENTITY attributes */ replidentfull = (rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL); if (!replidentfull) idattrs = RelationGetIdentityKeyBitmap(rel); + /* send number of live attributes */ + for (i = 0; i < desc->natts; i++) + { + Form_pg_attribute att = TupleDescAttr(desc, i); + + if (att->attisdropped || att->attgenerated) + continue; + if (columns != NULL && !bms_is_member(att->attnum, columns)) + continue; + nliveatts++; + } + pq_sendint16(out, nliveatts); + This change seemed to have the effect of moving that 4 lines of "replidentfull" code from below the loop to above the loop. But moving that code seems unrelated to the Col-Filter patch. (??). ~~~ 19. src/backend/replication/logical/tablesync.c - fetch_remote_table_info @@ -793,12 +877,12 @@ fetch_remote_table_info(char *nspname, char *relname, ExecClearTuple(slot); } + ExecDropSingleTupleTableSlot(slot); - - lrel->natts = natt; - walrcv_clear_result(res); pfree(cmd.data); + + lrel->natts = natt; } The shuffling of those few lines seems unrelated to any requirement of the Col-Filter patch (??) ~~~ 20. src/backend/replication/logical/tablesync.c - copy_table + for (int i = 0; i < lrel.natts; i++) + { + appendStringInfoString(&cmd, quote_identifier(lrel.attnames[i])); + if (i < lrel.natts - 1) + appendStringInfoString(&cmd, ", "); + } Perhaps that could be expressed more simply if the other way around like: for (int i = 0; i < lrel.natts; i++) { if (i) appendStringInfoString(&cmd, ", "); appendStringInfoString(&cmd, quote_identifier(lrel.attnames[i])); } ~~~ 21. src/backend/replication/pgoutput/pgoutput.c + + /* + * Set of columns included in the publication, or NULL if all columns are + * included implicitly. Note that the attnums in this list are not + * shifted by FirstLowInvalidHeapAttributeNumber. + */ + Bitmapset *columns; Typo: "in this list" --> "in this set" (??) ~~~ 22. src/backend/replication/pgoutput/pgoutput.c - get_rel_sync_entry * Don't publish changes for partitioned tables, because - * publishing those of its partitions suffices, unless partition - * changes won't be published due to pubviaroot being set. + * publishing those of its partitions suffices. (However, ignore + * this if partition changes are not to published due to + * pubviaroot being set.) */ This change seems unrelated to the Col-Filter patch, so perhaps it should not be here at all. Also, typo: "are not to published" ~~~ 23. src/backend/replication/pgoutput/pgoutput.c - get_rel_sync_entry + /* + * Obtain columns published by this publication, and add them + * to the list for this rel. Note that if at least one + * publication has a empty column list, that means to publish + * everything; so if we saw a publication that includes all + * columns, skip this. + */ Typo: "a empty" --> "an empty" ~~~ 24. src/backend/replication/pgoutput/pgoutput.c - get_rel_sync_entry + if (isnull) + { + /* + * If we see a publication with no columns, reset the + * list and ignore further ones. + */ Perhaps that comment is meant to say "with no column filter" instead of "with no columns"? ~~~ 25. src/backend/replication/pgoutput/pgoutput.c - get_rel_sync_entry + if (isnull) + { ... + } + else if (!isnull) + { ... + } Is the "if (!isnull)" in the else just to be really REALLY sure it is not null? ~~~ 26. src/bin/pg_dump/pg_dump.c - getPublicationTables + pubrinfo[i].pubrattrs = attribs->data; + } + else + pubrinfo[j].pubrattrs = NULL; I got confused reading this code. Are those different indices 'i' and 'j' correct? ~~~ 27. src/bin/psql/describe.c The Row-Filter [1] displays filter information not only for the psql \dRp+ command but also for the psql \d <tablename> command. Perhaps the Col-Filter patch should do that too. ~~~ 28. src/bin/psql/tab-complete.c @@ -1657,6 +1657,8 @@ psql_completion(const char *text, int start, int end) /* ALTER PUBLICATION <name> ADD */ else if (Matches("ALTER", "PUBLICATION", MatchAny, "ADD")) COMPLETE_WITH("ALL TABLES IN SCHEMA", "TABLE"); + else if (Matches("ALTER", "PUBLICATION", MatchAny, "ADD", "TABLE")) + COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, NULL); /* ALTER PUBLICATION <name> DROP */ I am not sure about this one- is that change even related to the Col-Filter patch or is this some unrelated bugfix? ~~~ 29. src/include/catalog/pg_publication.h @@ -86,6 +86,7 @@ typedef struct Publication typedef struct PublicationRelInfo { Relation relation; + List *columns; } PublicationRelInfo; Perhaps that needs some comment. e.g. do you need to mention that a NIL List means all columns? ~~~ 30. src/include/nodes/parsenodes.h @@ -3642,6 +3642,7 @@ typedef struct PublicationTable { NodeTag type; RangeVar *relation; /* relation to be published */ + List *columns; /* List of columns in a publication table */ } PublicationTable; That comment "List of columns in a publication table" doesn't really say anything helpful. Perhaps it should mention that a NIL List means all table columns? ~~~ 31. src/test/regress/sql/publication.sql The regression test file has an uncommon mixture of /* */ and -- style comments. Perhaps change all the /* */ ones? ~~~ 32. src/test/regress/sql/publication.sql +CREATE TABLE testpub_tbl5 (a int PRIMARY KEY, b text, c text, + d int generated always as (a + length(b)) stored); +ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, x); -- error +ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (b, c); -- error +ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, d); -- error +ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, c); -- ok For all these tests (and more) there seems not sufficient explanation comments to say exactly what each test case is testing, e.g. *why* is an "error" expected for some cases but "ok" for others. ~~~ 33. src/test/regress/sql/publication.sql "-- no dice" (??) confusing comment. ~~~ 34. src/test/subscription/t/028_column_list.pl I think a few more comments in this TAP file would help to make the purpose of the tests more clear. ------ [1] https://www.postgresql.org/message-id/flat/CAHut%2BPtNWXPba0h%3Ddo_UiwaEziePNr7Z%2B58%2B-ctpyP2Pq1VkPw%40mail.gmail.com#76afd191811cba236198f62e60f44ade Kind Regards, Peter Smith. Fujitsu Australia