On 2022-Sep-07, Amit Kapila wrote: > Doc: Explain about Column List feature. > > Add a new logical replication section for "Column Lists" (analogous to the > Row Filters page). This explains how the feature can be used and the > caveats in it. > > Author: Peter Smith > Reviewed-by: Shi yu, Vignesh C, Erik Rijkers, Amit Kapila > Backpatch-through: 15, where it was introduced > Discussion: > https://postgr.es/m/CAHut+PvOuc9=_4TbASc5=vuqh16uwtfo3gzckqk_5m1hrw3...@mail.gmail.com
Hi I just read these docs and noticed that it mentions that column lists can be used for security. As far as I remember, this is wrong: it is the subscriber that builds the query to read column data during initial sync, and the publisher doesn't forbid to read columns not in it, so it is entirely possible for a malicious subscriber to read columns other than those published. I'm pretty sure we discussed this at some point during development of the feature. So I suggest to mention this point explicitly in its own paragraph, to avoid giving a false sense of security. While going over this text I found some additional things that could --in my opinion-- stand some improvement: * It feels better to start the section saying that a list can be specified; subscriber must have all those columns; omitting list means to publish everything. That leads to shorter text (no need to say "you need to have them all, oh wait you might only have a few"). * there's no reason to explain the syntax in vague terms and refer the reader to the reference page. * The first few <sect2> seem to give no useful structure, and instead cause the text to become disorganized. I propose to remove them, and instead mix the paragraphs in them so that we explain the rules and the behavior, and lastly the effect on specific commands. The attached patch effects those changes. One more thing. There's a sect2 about combining column list. Part of it seems pretty judgmental and I see no reason to have it in there; I propose to remove it (it's not in this patch). I think we should just say it doesn't work at present, here's how to work around it, and perhaps even say that we may lift the restriction in the future. The paragraph that starts with "Background:" is IMO out of place, and it repeats the mistake that column lists are for security. Lastly: In the create-publication reference page I think it would be better to reword the Parameters section a bit. It mentions FOR TABLE as a parameter, but the parameter is actually <replaceable>table_name</replaceable>; and the row-filter and column-list explanations are also put in there when they should be in their own <replaceable>expression</> and <replaceable>column_name</> varlistentries. I think splitting things that way would result in a clearer explanation. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml index 0ab191e402..3ac6336be2 100644 --- a/doc/src/sgml/logical-replication.sgml +++ b/doc/src/sgml/logical-replication.sgml @@ -1093,70 +1093,55 @@ test_sub=# SELECT * FROM child ORDER BY a; <title>Column Lists</title> <para> - By default, all columns of a published table will be replicated to the - appropriate subscribers. The subscriber table must have at least all the - columns of the published table. However, if a - <firstterm>column list</firstterm> is specified then only the columns named - in the list will be replicated. This means the subscriber-side table only - needs to have those columns named by the column list. A user might choose to - use column lists for behavioral, security or performance reasons. + Each publication can optionally specify which columns of each table are + replicated to subscribers. The table on the subscriber side must have at + least all the columns that are published. If no column list is specified, + then all columns in the publisher are replicated. + See <xref linkend="sql-createpublication"/> for details on the syntax. </para> - <sect2 id="logical-replication-col-list-rules"> - <title>Column List Rules</title> + <para> + The choice of columns can be based on behavioral or performance reasons. + However, do not rely on this feature for security: a malicious subscriber + is able to obtain data from columns that are not specifically + published. If security is a consideration, protections can be applied + at the publisher side. + </para> - <para> - A column list is specified per table following the table name, and enclosed - by parentheses. See <xref linkend="sql-createpublication"/> for details. - </para> + <para> + If no column list is specified, all columns of the table are replicated + through this publication, including any columns added later. This means + that having a column list which names all columns is not the same as having + no column list at all: if columns are added to the table afterwards, those + will not be replicated. + </para> - <para> - When specifying a column list, the order of columns is not important. If no - column list is specified, all columns of the table are replicated through - this publication, including any columns added later. This means a column - list which names all columns is not quite the same as having no column list - at all. For example, if additional columns are added to the table then only - those named columns mentioned in the column list will continue to be - replicated. - </para> + <para> + A column list can contain only simple column references. The order + of columns in the list is not preserved. + </para> - <para> - Column lists have no effect for <literal>TRUNCATE</literal> command. - </para> + <para> + For partitioned tables, the publication parameter + <literal>publish_via_partition_root</literal> determines which column list + is used. If <literal>publish_via_partition_root</literal> is + <literal>true</literal>, the root partitioned table's column list is used. + Otherwise, if <literal>publish_via_partition_root</literal> is + <literal>false</literal> (the default), each partition's column list is used. + </para> - </sect2> + <para> + If a publication publishes <command>UPDATE</command> or + <command>DELETE</command> operations, any column list must include the + table's replica identity columns (see + <xref linkend="sql-altertable-replica-identity"/>). + If a publication publishes only <command>INSERT</command> operations, then + the column list may omit replica identity columns. + </para> - <sect2 id="logical-replication-col-list-restrictions"> - <title>Column List Restrictions</title> - - <para> - A column list can contain only simple column references. - </para> - - <para> - If a publication publishes <command>UPDATE</command> or - <command>DELETE</command> operations, any column list must include the - table's replica identity columns (see - <xref linkend="sql-altertable-replica-identity"/>). - If a publication publishes only <command>INSERT</command> operations, then - the column list is arbitrary and may omit some replica identity columns. - </para> - - </sect2> - - <sect2 id="logical-replication-col-list-partitioned"> - <title>Partitioned Tables</title> - - <para> - For partitioned tables, the publication parameter - <literal>publish_via_partition_root</literal> determines which column list - is used. If <literal>publish_via_partition_root</literal> is - <literal>true</literal>, the root partitioned table's column list is used. - Otherwise, if <literal>publish_via_partition_root</literal> is - <literal>false</literal> (default), each partition's column list is used. - </para> - - </sect2> + <para> + Column lists have no effect for the <literal>TRUNCATE</literal> command. + </para> <sect2 id="logical-replication-col-list-initial-data-sync"> <title>Initial Data Synchronization</title> diff --git a/doc/src/sgml/ref/create_publication.sgml b/doc/src/sgml/ref/create_publication.sgml index f61641896a..0a68c4bf73 100644 --- a/doc/src/sgml/ref/create_publication.sgml +++ b/doc/src/sgml/ref/create_publication.sgml @@ -94,7 +94,7 @@ CREATE PUBLICATION <replaceable class="parameter">name</replaceable> effect on <literal>TRUNCATE</literal> commands. See <xref linkend="logical-replication-col-lists"/> for details about column lists. -</para> + </para> <para> Only persistent base tables and partitioned tables can be part of a