On 2022-Sep-07, Amit Kapila wrote:

> Doc: Explain about Column List feature.
> 
> Add a new logical replication section for "Column Lists" (analogous to the
> Row Filters page). This explains how the feature can be used and the
> caveats in it.
> 
> Author: Peter Smith
> Reviewed-by: Shi yu, Vignesh C, Erik Rijkers, Amit Kapila
> Backpatch-through: 15, where it was introduced
> Discussion: 
> https://postgr.es/m/CAHut+PvOuc9=_4TbASc5=vuqh16uwtfo3gzckqk_5m1hrw3...@mail.gmail.com

Hi

I just read these docs and noticed that it mentions that column lists
can be used for security.  As far as I remember, this is wrong: it is
the subscriber that builds the query to read column data during initial
sync, and the publisher doesn't forbid to read columns not in it, so it
is entirely possible for a malicious subscriber to read columns other
than those published.  I'm pretty sure we discussed this at some point
during development of the feature.

So I suggest to mention this point explicitly in its own paragraph, to
avoid giving a false sense of security.

While going over this text I found some additional things that could
--in my opinion-- stand some improvement:

* It feels better to start the section saying that a list can be
  specified; subscriber must have all those columns; omitting list
  means to publish everything. That leads to shorter text (no need to
  say "you need to have them all, oh wait you might only have a few").

* there's no reason to explain the syntax in vague terms and refer the
  reader to the reference page.

* The first few <sect2> seem to give no useful structure, and instead
  cause the text to become disorganized. I propose to remove them, and
  instead mix the paragraphs in them so that we explain the rules and
  the behavior, and lastly the effect on specific commands.

The attached patch effects those changes.


One more thing. There's a sect2 about combining column list.  Part of it
seems pretty judgmental and I see no reason to have it in there; I
propose to remove it (it's not in this patch).  I think we should just
say it doesn't work at present, here's how to work around it, and
perhaps even say that we may lift the restriction in the future.  The
paragraph that starts with "Background:" is IMO out of place, and it
repeats the mistake that column lists are for security.


Lastly: In the create-publication reference page I think it would be
better to reword the Parameters section a bit.  It mentions
FOR TABLE as a parameter, but the parameter is actually
<replaceable>table_name</replaceable>; and the row-filter and
column-list explanations are also put in there when they should be in
their own <replaceable>expression</> and <replaceable>column_name</>
varlistentries.  I think splitting things that way would result in a
clearer explanation.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index 0ab191e402..3ac6336be2 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -1093,70 +1093,55 @@ test_sub=# SELECT * FROM child ORDER BY a;
   <title>Column Lists</title>
 
   <para>
-   By default, all columns of a published table will be replicated to the
-   appropriate subscribers. The subscriber table must have at least all the
-   columns of the published table. However, if a
-   <firstterm>column list</firstterm> is specified then only the columns named
-   in the list will be replicated. This means the subscriber-side table only
-   needs to have those columns named by the column list. A user might choose to
-   use column lists for behavioral, security or performance reasons.
+   Each publication can optionally specify which columns of each table are
+   replicated to subscribers. The table on the subscriber side must have at
+   least all the columns that are published. If no column list is specified,
+   then all columns in the publisher are replicated.
+   See <xref linkend="sql-createpublication"/> for details on the syntax.
   </para>
 
-  <sect2 id="logical-replication-col-list-rules">
-   <title>Column List Rules</title>
+  <para>
+   The choice of columns can be based on behavioral or performance reasons.
+   However, do not rely on this feature for security: a malicious subscriber
+   is able to obtain data from columns that are not specifically
+   published.  If security is a consideration, protections can be applied
+   at the publisher side.
+  </para>
 
-   <para>
-    A column list is specified per table following the table name, and enclosed
-    by parentheses. See <xref linkend="sql-createpublication"/> for details.
-   </para>
+  <para>
+   If no column list is specified, all columns of the table are replicated
+   through this publication, including any columns added later. This means
+   that having a column list which names all columns is not the same as having
+   no column list at all: if columns are added to the table afterwards, those
+   will not be replicated.
+  </para>
 
-   <para>
-    When specifying a column list, the order of columns is not important. If no
-    column list is specified, all columns of the table are replicated through
-    this publication, including any columns added later. This means a column
-    list which names all columns is not quite the same as having no column list
-    at all. For example, if additional columns are added to the table then only
-    those named columns mentioned in the column list will continue to be
-    replicated.
-   </para>
+  <para>
+   A column list can contain only simple column references.  The order
+   of columns in the list is not preserved.
+  </para>
 
-   <para>
-    Column lists have no effect for <literal>TRUNCATE</literal> command.
-   </para>
+  <para>
+   For partitioned tables, the publication parameter
+   <literal>publish_via_partition_root</literal> determines which column list
+   is used. If <literal>publish_via_partition_root</literal> is
+   <literal>true</literal>, the root partitioned table's column list is used.
+   Otherwise, if <literal>publish_via_partition_root</literal> is
+   <literal>false</literal> (the default), each partition's column list is used.
+  </para>
 
-  </sect2>
+  <para>
+   If a publication publishes <command>UPDATE</command> or
+   <command>DELETE</command> operations, any column list must include the
+   table's replica identity columns (see
+   <xref linkend="sql-altertable-replica-identity"/>).
+   If a publication publishes only <command>INSERT</command> operations, then
+   the column list may omit replica identity columns.
+  </para>
 
-  <sect2 id="logical-replication-col-list-restrictions">
-   <title>Column List Restrictions</title>
-
-   <para>
-    A column list can contain only simple column references.
-   </para>
-
-   <para>
-    If a publication publishes <command>UPDATE</command> or
-    <command>DELETE</command> operations, any column list must include the
-    table's replica identity columns (see
-    <xref linkend="sql-altertable-replica-identity"/>).
-    If a publication publishes only <command>INSERT</command> operations, then
-    the column list is arbitrary and may omit some replica identity columns.
-   </para>
-
-  </sect2>
-
-  <sect2 id="logical-replication-col-list-partitioned">
-   <title>Partitioned Tables</title>
-
-   <para>
-    For partitioned tables, the publication parameter
-    <literal>publish_via_partition_root</literal> determines which column list
-    is used. If <literal>publish_via_partition_root</literal> is
-    <literal>true</literal>, the root partitioned table's column list is used.
-    Otherwise, if <literal>publish_via_partition_root</literal> is
-    <literal>false</literal> (default), each partition's column list is used.
-   </para>
-
-  </sect2>
+  <para>
+   Column lists have no effect for the <literal>TRUNCATE</literal> command.
+  </para>
 
   <sect2 id="logical-replication-col-list-initial-data-sync">
    <title>Initial Data Synchronization</title>
diff --git a/doc/src/sgml/ref/create_publication.sgml b/doc/src/sgml/ref/create_publication.sgml
index f61641896a..0a68c4bf73 100644
--- a/doc/src/sgml/ref/create_publication.sgml
+++ b/doc/src/sgml/ref/create_publication.sgml
@@ -94,7 +94,7 @@ CREATE PUBLICATION <replaceable class="parameter">name</replaceable>
       effect on <literal>TRUNCATE</literal> commands. See
       <xref linkend="logical-replication-col-lists"/> for details about column
       lists.
-</para>
+     </para>
 
      <para>
       Only persistent base tables and partitioned tables can be part of a

Reply via email to