Re: row filtering for logical replication

Peter Smith Thu, 26 Aug 2021 15:01:58 -0700

On Thu, Aug 26, 2021 at 9:13 PM Amit Kapila <amit.kapil...@gmail.com> wrote:
>
> On Thu, Aug 26, 2021 at 3:41 PM Peter Smith <smithpb2...@gmail.com> wrote:
> >
> > On Thu, Aug 26, 2021 at 3:00 PM Amit Kapila <amit.kapil...@gmail.com> wrote:
> > >
> > > On Thu, Aug 26, 2021 at 9:51 AM Peter Smith <smithpb2...@gmail.com> wrote:
> > > >
> > > > On Thu, Aug 26, 2021 at 1:20 PM Amit Kapila <amit.kapil...@gmail.com> 
> > > > wrote:
> > > > >
> > > > > On Thu, Aug 26, 2021 at 7:37 AM Peter Smith <smithpb2...@gmail.com> 
> > > > > wrote:
> > > > > >
> > > > > > On Wed, Aug 25, 2021 at 3:28 PM Amit Kapila 
> > > > > > <amit.kapil...@gmail.com> wrote:
> > > > > > >
> > > > > > ...
> > > > > > >
> > > > > > > Hmm, I think the gain via caching is not visible because we are 
> > > > > > > using
> > > > > > > simple expressions. It will be visible when we use somewhat 
> > > > > > > complex
> > > > > > > expressions where expression evaluation cost is significant.
> > > > > > > Similarly, the impact of this change will magnify and it will 
> > > > > > > also be
> > > > > > > visible when a publication has many tables. Apart from 
> > > > > > > performance,
> > > > > > > this change is logically correct as well because it would be any 
> > > > > > > way
> > > > > > > better if we don't invalidate the cached expressions unless 
> > > > > > > required.
> > > > > >
> > > > > > Please tell me what is your idea of a "complex" row filter 
> > > > > > expression.
> > > > > > Do you just mean a filter that has multiple AND conditions in it? I
> > > > > > don't really know if few complex expressions would amount to any
> > > > > > significant evaluation costs, so I would like to run some timing 
> > > > > > tests
> > > > > > with some real examples to see the results.
> > > > > >
> > > > >
> > > > > I think this means you didn't even understand or are convinced why the
> > > > > patch has cache in the first place. As per your theory, even if we
> > > > > didn't have cache, it won't matter but that is not true otherwise, the
> > > > > patch wouldn't have it.
> > > >
> > > > I have never said there should be no caching. On the contrary, my
> > > > performance test results [1] already confirmed that caching ExprState
> > > > is of benefit for the millions of times it may be used in the
> > > > pgoutput_row_filter function. My only doubts are in regard to how much
> > > > observable impact there would be re-evaluating the filter expression
> > > > just a few extra times by the get_rel_sync_entry function.
> > > >
> > >
> > > I think it depends but why in the first place do you want to allow
> > > re-evaluation when there is a way for not doing that?
> >
> > Because the current code logic of having the "delayed" ExprState
> > evaluation does come at some cost.
> >
>
> So, now you mixed it with the second point. Here, I was talking about
> the need for correct invalidation but you started discussing when to
> first time evaluate the expression, both are different things.
>
> >  And the cost is -
> > a. Needing an extra condition and more code in the function 
> > pgoutput_row_filter
> > b. Needing to maintain the additional Node list
> >
>
> I am not sure you need (b) above and I think (a) should make the
> overall code look clean.
>
> > If we chose not to implement a delayed ExprState cache evaluation then
> > there would still be a (one-time) ExprState cache evaluation but it
> > would happen whenever get_rel_sync_entry is called (regardless of if
> > pgoputput_row_filter is subsequently called). E.g. there can be some
> > rebuilds of the ExprState cache if the user calls TRUNCATE.
> >
>
> Apart from Truncate, it will also be a waste if any error happens
> before actually evaluating the filter, tomorrow there could be other
> operations like replication of sequences (I have checked that proposed
> patch for sequences uses get_rel_sync_entry) where we don't need to
> build ExprState (as filters might or might not be there). So, it would
> be better to avoid cache lookups in those cases if possible. I still
> think doing expensive things like preparing expressions should ideally
> be done only when it is required.


OK. Per your suggestion, I will try to move as much of the row-filter
cache code as possible out of the get_rel_sync_entry function and into
the pgoutput_row_filter function.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: row filtering for logical replication

Reply via email to