On Mon, May 20, 2024 at 1:49 PM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > On Wed, May 8, 2024 at 4:14 PM Shubham Khanna > <khannashubham1...@gmail.com> wrote: > > > > On Wed, May 8, 2024 at 11:39 AM Rajendra Kumar Dangwal > > <dangwalrajendra...@gmail.com> wrote: > > > > > > Hi PG Hackers. > > > > > > We are interested in enhancing the functionality of the pgoutput plugin > > > by adding support for generated columns. > > > Could you please guide us on the necessary steps to achieve this? > > > Additionally, do you have a platform for tracking such feature requests? > > > Any insights or assistance you can provide on this matter would be > > > greatly appreciated. > > > > The attached patch has the changes to support capturing generated > > column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the > > ‘include_generated_columns’ option is specified, the generated column > > information and generated column data also will be sent. > > As Euler mentioned earlier, I think it's a decision not to replicate > generated columns because we don't know the target table on the > subscriber has the same expression and there could be locale issues > even if it looks the same. I can see that a benefit of this proposal > would be to save cost to compute generated column values if the user > wants the target table on the subscriber to have exactly the same data > as the publisher's one. Are there other benefits or use cases? >
The cost is one but the other is the user may not want the data to be different based on volatile functions like timeofday() or the table on subscriber won't have the column marked as generated. Now, considering such use cases, is providing a subscription-level option a good idea as the patch is doing? I understand that this can serve the purpose but it could also lead to having the same behavior for all the tables in all the publications for a subscription which may or may not be what the user expects. This could lead to some performance overhead (due to always sending generated columns for all the tables) for cases where the user needs it only for a subset of tables. I think we should consider it as a table-level option while defining publication in some way. A few ideas could be: (a) We ask users to explicitly mention the generated column in the columns list while defining publication. This has a drawback such that users need to specify the column list even when all columns need to be replicated. (b) We can have some new syntax to indicate the same like: CREATE PUBLICATION pub1 FOR TABLE t1 INCLUDE GENERATED COLS, t2, t3, t4 INCLUDE ..., t5;. I haven't analyzed the feasibility of this, so there could be some challenges but we can at least investigate it. Yet another idea is to keep this as a publication option (include_generated_columns or publish_generated_columns) similar to "publish_via_partition_root". Normally, "publish_via_partition_root" is used when tables on either side have different partition hierarchies which is somewhat the case here. Thoughts? -- With Regards, Amit Kapila.