Re: Pgoutput not capturing the generated columns

Amit Kapila Wed, 28 Aug 2024 23:17:04 -0700

On Thu, Aug 29, 2024 at 8:44 AM Masahiko Sawada <sawada.m...@gmail.com> wrote:
>
> On Wed, Aug 28, 2024 at 1:06 AM Amit Kapila <amit.kapil...@gmail.com> wrote:
> >
> > On Mon, May 20, 2024 at 1:49 PM Masahiko Sawada <sawada.m...@gmail.com> 
> > wrote:
> > >
> > > As Euler mentioned earlier, I think it's a decision not to replicate
> > > generated columns because we don't know the target table on the
> > > subscriber has the same expression and there could be locale issues
> > > even if it looks the same. I can see that a benefit of this proposal
> > > would be to save cost to compute generated column values if the user
> > > wants the target table on the subscriber to have exactly the same data
> > > as the publisher's one. Are there other benefits or use cases?
> > >
> >
> > The cost is one but the other is the user may not want the data to be
> > different based on volatile functions like timeofday()
>
> Shouldn't the generation expression be immutable?
>


Yes, I missed that point.

> > or the table on
> > subscriber won't have the column marked as generated.
>
> Yeah, it would be another use case.
>

Right, apart from that I am not aware of other use cases. If they
have, I would request Euler or Rajendra to share any other use case.

> >  Now, considering
> > such use cases, is providing a subscription-level option a good idea
> > as the patch is doing? I understand that this can serve the purpose
> > but it could also lead to having the same behavior for all the tables
> > in all the publications for a subscription which may or may not be
> > what the user expects. This could lead to some performance overhead
> > (due to always sending generated columns for all the tables) for cases
> > where the user needs it only for a subset of tables.
>
> Yeah, it's a downside and I think it's less flexible. For example, if
> users want to send both tables with generated columns and tables
> without generated columns, they would have to create at least two
> subscriptions.
>

Agreed and that would consume more resources.

> Also, they would have to include a different set of
> tables to two publications.
>
> >
> > I think we should consider it as a table-level option while defining
> > publication in some way. A few ideas could be: (a) We ask users to
> > explicitly mention the generated column in the columns list while
> > defining publication. This has a drawback such that users need to
> > specify the column list even when all columns need to be replicated.
> > (b) We can have some new syntax to indicate the same like: CREATE
> > PUBLICATION pub1 FOR TABLE t1 INCLUDE GENERATED COLS, t2, t3, t4
> > INCLUDE ..., t5;. I haven't analyzed the feasibility of this, so there
> > could be some challenges but we can at least investigate it.
>
> I think we can create a publication for a single table, so what we can
> do with this feature can be done also by the idea you described below.
>
> > Yet another idea is to keep this as a publication option
> > (include_generated_columns or publish_generated_columns) similar to
> > "publish_via_partition_root". Normally, "publish_via_partition_root"
> > is used when tables on either side have different partition
> > hierarchies which is somewhat the case here.
>
> It sounds more useful to me.
>

Fair enough. Let's see if anyone else has any preference among the
proposed methods or can think of a better way.

-- 
With Regards,
Amit Kapila.

Re: Pgoutput not capturing the generated columns

Reply via email to