On Wed, Aug 25, 2021 at 3:28 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > ... > > Hmm, I think the gain via caching is not visible because we are using > simple expressions. It will be visible when we use somewhat complex > expressions where expression evaluation cost is significant. > Similarly, the impact of this change will magnify and it will also be > visible when a publication has many tables. Apart from performance, > this change is logically correct as well because it would be any way > better if we don't invalidate the cached expressions unless required.
Please tell me what is your idea of a "complex" row filter expression. Do you just mean a filter that has multiple AND conditions in it? I don't really know if few complex expressions would amount to any significant evaluation costs, so I would like to run some timing tests with some real examples to see the results. On Wed, Aug 25, 2021 at 6:31 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Wed, Aug 25, 2021 at 10:57 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > On Wed, Aug 25, 2021 at 5:52 AM Euler Taveira <eu...@eulerto.com> wrote: > > > > > > On Tue, Aug 24, 2021, at 4:46 AM, Peter Smith wrote: > > > > > > > Anyway, I have implemented the suggested cache change because I agree > > > it is probably theoretically superior, even if in practice there is > > > almost no difference. > > > > > > I didn't inspect your patch carefully but it seems you add another List to > > > control this new cache mechanism. I don't like it. IMO if we can use the > > > data > > > structures that we have now, let's implement your idea; otherwise, -1 for > > > this > > > new micro optimization. > > > > > > > As mentioned above, without this we will invalidate many cached > > expressions even though it is not required. I don't deny that there > > might be a better way to achieve the same and if you or Peter have any > > ideas, I am all ears. > > > > I see that the new list is added to store row_filter node which we > later use to compute expression. This is not required for invalidation > but for delaying the expression evaluation till it is required (for > example, for truncate, we may not need the row evaluation, so there is > no need to compute it). Can we try to postpone the syscache lookup to > a later stage when we are actually doing row_filtering? If we can do > that, then I think we can avoid having this extra list? Yes, you are correct - that Node list was re-instated only because you had requested that the ExprState evaluation should be deferred until it is needed by the pgoutput_row_filter. Otherwise, the additional list would not be needed so everything would be much the same as in v23 except the invalidations would be more focussed on single tables. I don't think the syscache lookup can be easily postponed. That logic of get_rel_sync_entry processes the table filters of *all* publications, so moving that publications loop (including the partition logic) into the pgoutput_row_filter seems a bridge too far IMO. Furthermore, I am not yet convinced that this ExprState postponement is very useful. It may be true that for truncate there is no need to compute it, but consider that the user would never even define a row filter in the first place unless they intended there will be some CRUD operations. So even if the truncate does not need the filter, *something* is surely going to need it. In other words, IIUC this postponement is not going to save any time overall - it only shifting when the (one time) expression evaluation will happen. I feel it would be better to just remove the postponed evaluation of the ExprState added in v24. That will remove any need for the extra Node list (which I think is Euler's concern). The ExprState cache will still be slightly improved from how it was implemented before because it is "logically correct" that we don't invalidate the cached expressions unless required. Thoughts? ------ Kind Regards, Peter Smith. Fujitsu Australia