Re: row filtering for logical replication

Amit Kapila Fri, 17 Dec 2021 18:33:46 -0800

On Fri, Dec 17, 2021 at 5:29 PM Greg Nancarrow <[email protected]> wrote:
>
> On Fri, Dec 17, 2021 at 7:20 PM Ajin Cherian <[email protected]> wrote:
> >
> > On Fri, Dec 17, 2021 at 5:46 PM Greg Nancarrow <[email protected]> wrote:
> >
> > > So using the v47 patch-set, I still find that the UPDATE above results in 
> > > publication of an INSERT of (2,1), rather than an UPDATE of (1,1) to 
> > > (2,1).
> > > This is according to the 2nd UPDATE rule below, from patch 0003.
> > >
> > > + * old-row (no match)    new-row (no match)  -> (drop change)
> > > + * old-row (no match)    new row (match)     -> INSERT
> > > + * old-row (match)       new-row (no match)  -> DELETE
> > > + * old-row (match)       new row (match)     -> UPDATE
> > >
> > > This is because the old row (1,1) doesn't match the UPDATE filter 
> > > "(a>1)", but the new row (2,1) does.
> > > This functionality doesn't seem right to me. I don't think it can be 
> > > assumed that (1,1) was never published (and thus requires an INSERT 
> > > rather than UPDATE) based on these checks, because in this example, (1,1) 
> > > was previously published via a different operation - INSERT (and using a 
> > > different filter too).
> > > I think the fundamental problem here is that these UPDATE rules assume 
> > > that the old (current) row was previously UPDATEd (and published, or not 
> > > published, according to the filter applicable to UPDATE), but this is not 
> > > necessarily the case.
> > > Or am I missing something?
> >
> > But it need not be correct in assuming that the old-row was part of a
> > previous INSERT either (and published, or not published according to
> > the filter applicable to an INSERT).
> > For example, change the sequence of inserts and updates prior to the
> > last update:
> >
> > truncate tbl1 ;
> > insert into tbl1 values (1,5); ==> not replicated since insert and ! (b < 
> > 2);
> > update tbl1 set b = 1; ==> not replicated since update and ! (a > 1)
> > update tbl1 set a = 2; ==> replicated and update converted to insert
> > since (a > 1)
> >
> > In this case, the last update "update tbl1 set a = 2; " is updating a
> > row that was previously updated and not inserted and not replicated to
> > the subscriber.
> > How does the replication logic differentiate between these two cases,
> > and decide if the update was previously published or not?
> > I think it's futile for the publisher side to try and figure out the
> > history of published rows. In fact, if this level of logic is required
> > then it is best implemented on the subscriber side, which then defeats
> > the purpose of a publication filter.
> >
>
> I think it's a concern, for such a basic example with only one row,
> getting unpredictable (and even wrong) replication results, depending
> upon the order of operations.
>


I am not sure how we can deduce that. The results are based on current
and new values of row which is what I think we are expecting here.

> Doesn't this problem result from allowing different WHERE clauses for
> different pubactions for the same table?
> My current thoughts are that this shouldn't be allowed, and also WHERE
> clauses for INSERTs should, like UPDATE and DELETE, be restricted to
> using only columns covered by the replica identity or primary key.
>

Hmm, even if we do that one could have removed the insert row filter
by the time we are evaluating the update. So, we will get the same
result. I think the behavior in your example is as we expect as per
the specs defined by the patch and I don't see any problem, in this
case, w.r.t replication results. Let us see what others think on this?



--
With Regards,
Amit Kapila.

Re: row filtering for logical replication

Reply via email to