Re: Column Filtering in Logical Replication

Amit Kapila Fri, 17 Dec 2021 20:07:34 -0800

On Fri, Dec 17, 2021 at 3:16 PM houzj.f...@fujitsu.com
<houzj.f...@fujitsu.com> wrote:
>
> On Friday, December 17, 2021 1:55 AM Alvaro Herrera <alvhe...@alvh.no-ip.org> 
> wrote:
> > On 2021-Dec-16, houzj.f...@fujitsu.com wrote:
> >
> > > The patch ensures all columns of RT are in column list when
> > > CREATE/ALTER publication, but it seems doesn't prevent user from
> > > changing the replica identity or dropping the index used in replica
> > > identity. Do we also need to check those cases ?
> >
> > Yes, we do.  As it happens, I spent a couple of hours yesterday writing 
> > code for
> > that, at least partially.  I haven't yet checked what happens with cases 
> > like
> > REPLICA NOTHING, or REPLICA INDEX <xyz> and then dropping that index.
> >
> > My initial ideas were a bit wrong BTW: I thought we should check the
> > combination of column lists in all publications (a bitwise-OR of column 
> > bitmaps,
> > so to speak).  But conceptually that's wrong: we need to check the column 
> > list
> > of each publication individually instead.  Otherwise, if you wanted to hide 
> > a
> > column from some publication but that column was part of the replica 
> > identity,
> > there'd be no way to identify the tuple in the replica.  (Or, if the 
> > pgouput code
> > disobeys the column list and sends the replica identity even if it's not in 
> > the
> > column list, then you'd be potentially publishing data that you wanted to 
> > hide.)
>
> Thanks for the explanation.
>
> Apart from ALTER REPLICA IDENTITY and DROP INDEX, I think there could be
> some other cases we need to handle for the replica identity check:
>
> 1)
> When adding a partitioned table with column list to the publication, I think 
> we
> need to check the RI of all its leaf partition. Because the RI on the 
> partition
> is the one actually takes effect.
>
> 2)
> ALTER TABLE ADD PRIMARY KEY;
> ALTER TABLE DROP CONSTRAINT "PRIMAEY KEY";
>
> If the replica identity is default, it will use the primary key. we might also
> need to prevent user from adding or removing primary key in this case.
>
>
> Based on the above cases, the RI check seems could bring considerable amount 
> of
> code. So, how about we follow what we already did in 
> CheckCmdReplicaIdentity(),
> we can put the check for RI in that function, so that we can cover all the
> cases and reduce the code change. And if we are worried about the cost of do
> the check for UPDATE and DELETE every time, we can also save the result in the
> relcache. It's safe because every operation change the RI will invalidate the
> relcache. We are using this approach in row filter patch to make sure all
> columns in row filter expression are part of RI.
>


Another point related to RI is that this patch seems to restrict
specifying the RI columns in the column filter list irrespective of
publish action. Do we need to have such a restriction if the
publication publishes 'insert' or 'truncate'?

-- 
With Regards,
Amit Kapila.

Re: Column Filtering in Logical Replication

Reply via email to