Hi all, Thank you all for the comments.
If there is no further comment, I will open the voting thread in 3 days. Regards, Xiangyu Ron Liu <ron9....@gmail.com>于2025年2月19日 周三17:58写道: > Hi, Xiangyu > > Thaks for updates, LGTM > > Best, > Ron > > xiangyu feng <xiangyu...@gmail.com> 于2025年2月19日周三 17:13写道: > >> Hi Ron, >> >> Thx for ur advice. I've made the changes to current FLIP[1] including >> renaming the interface and remove the default implementation. As we have >> discussed, the target columns will be compared in sink reuse if the sink >> has not implemented the `SupportsTargetColumnWriting` ability. This will >> make sure the sink reuse feature can still be safely enabled by default. >> >> [1] >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-506%3A+Support+Reuse+Multiple+Table+Sinks+in+Planner >> >> Best, >> Xiangyu Feng >> >> Ron Liu <ron9....@gmail.com> 于2025年2月19日周三 10:25写道: >> >>> Hi Xiangyu, >>> >>> Thanks for your reply, the updates LGTM overall. >>> >>> 1. Regarding the naming of the interface, what do you think about >>> calling it SupportsTargetColumnWriting? Here I would like to emphasize the >>> support for partial column writing, and I personally think the naming can >>> be aligned with SupportsWritingMetadata. >>> >>> 2. Regarding the interface methods, is it necessary to provide a default >>> implementation, do most of the stores support partial column writing? >>> >>> >>> Best, >>> Ron >>> >>> Cong Cheng <congchengch...@gmail.com> 于2025年2月18日周二 16:12写道: >>> >>>> Hi Xiangyu, >>>> >>>> Introduce a new sink ability interface named >>>> `SupportsTargetColumnUpdate`, >>>> > this interface will tell the planner if the sink has considered the >>>> target >>>> > columns information in its implementation; >>>> >>>> >>>> I think it makes a lot of sense, +1 for this ability. >>>> >>>> Sorry to all that I sended the draft of the content twice, something >>>> wrong with the enter of my keyboard. >>>> >>>> Best, >>>> Cong Cheng >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> xiangyu feng <xiangyu...@gmail.com> 于2025年2月18日周二 15:06写道: >>>> >>>> > Hi Kevin, >>>> > >>>> > Thx for ur valuable suggestion. I've made a few changes to current >>>> FLIP[1]. >>>> > >>>> > 1, Introduce a new sink ability interface named >>>> > `SupportsTargetColumnUpdate`, this interface will tell the planner if >>>> the >>>> > sink has considered the target columns information in its >>>> implementation; >>>> > >>>> > 2, This ability will return true by default for safety consideration; >>>> > >>>> > 3, When generating node digest for sink reuse, the digest will only >>>> ignore >>>> > the target column infos when this ability return false. This will >>>> require >>>> > extra work for specific sink. >>>> > >>>> > By applying these changes, we can safely enable the sink reuse >>>> feature by >>>> > default even for sinks like JDBC . And for sinks like Paimon, we can >>>> also >>>> > reuse the sink node across multiple partial-update streams with >>>> different >>>> > target columns by revising paimon sink to implement this interface. >>>> > >>>> > Glad to hear you back for these updates. >>>> > >>>> > [1] >>>> > >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-506%3A+Support+Reuse+Multiple+Table+Sinks+in+Planner >>>> > >>>> > >>>> > >>>> > Kevin Cheng <congchengch...@gmail.com> 于2025年2月14日周五 16:13写道: >>>> > >>>> >> Hi Xiangyu and Ron, >>>> >> >>>> >> I agree that sink reuse can be enabled by default from Flink Planner >>>> >> perspective. But the planner should be informed by Sink Connector >>>> that >>>> >> whether the planner can reuse different sink with different target >>>> >> columns. >>>> >> >>>> >> Take JBDC sink as an example, under partial update circumstances, >>>> the JDBC >>>> >> needs to know the target sink or update columns of every record. >>>> However, >>>> >> the target columns info is discarded in current FLIP design. >>>> >> >>>> >> Best, >>>> >> Xiangyu >>>> >> >>>> >> xiangyu feng <xiangyu...@gmail.com> 于2025年2月14日周五 13:51写道: >>>> >> >>>> >> > Hi Ron, >>>> >> > >>>> >> > After second thought, taking sink reuse as a long awaited feature >>>> >> > with significant benefits for users, I agree to enable this in the >>>> first >>>> >> > place. Similar features like >>>> `table.optimizer.reuse-sub-plan-enabled` >>>> >> and >>>> >> > `table.optimizer.reuse-source-enabled` are also enabled by >>>> default. From >>>> >> > this point of view, sink reuse should be the same. >>>> >> > >>>> >> > The Flip[1] has been revised accordingly. Thx for suggestion. >>>> >> > >>>> >> > [1] >>>> >> > >>>> >> > >>>> >> >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-506%3A+Support+Reuse+Multiple+Table+Sinks+in+Planner >>>> >> > >>>> >> > Regards, >>>> >> > Xiangyu >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > Ron Liu <ron9....@gmail.com> 于2025年2月14日周五 12:10写道: >>>> >> > >>>> >> > > Hi, Xiangyu >>>> >> > > >>>> >> > > >>> I prefer to set the default value of this option as'false in >>>> the >>>> >> > first >>>> >> > > place. Setting as true might introduce unexpected behavior for >>>> users >>>> >> > when >>>> >> > > operating existing jobs. Maybe we should introduce this feature >>>> at >>>> >> first >>>> >> > > and discuss enabling this feature as default in a separated >>>> thread. >>>> >> WDYT? >>>> >> > > >>>> >> > > 1. What unexpected behaviors do you think this might introduce? >>>> For >>>> >> Sink >>>> >> > > nodes, which are generally stateless, I intuitively understand >>>> that no >>>> >> > > state compatibility issues will be introduced after Sink reuse. >>>> >> > > >>>> >> > > 2. Since Sink reuse benefits users, why not enable this feature >>>> by >>>> >> > default >>>> >> > > on the first day it is introduced? If your concern is potential >>>> >> unhandled >>>> >> > > corner cases in the implementation, I consider those to be bugs. >>>> We >>>> >> > should >>>> >> > > prioritize fixing them rather than blocking the default >>>> enablement of >>>> >> > this >>>> >> > > optimization. >>>> >> > > >>>> >> > > 3. If we don't enable it by default now, when should we? What >>>> specific >>>> >> > > milestones or actions are needed during the waiting period? Your >>>> >> > concerns >>>> >> > > about unintended behaviors would still exist even if we enable it >>>> >> later. >>>> >> > > Why delay resolving this in a separate discussion instead of >>>> >> finalizing >>>> >> > it >>>> >> > > here? >>>> >> > > >>>> >> > > 4. From our internal practice, users still want to enjoy the >>>> benefits >>>> >> of >>>> >> > > this feature by default. >>>> >> > > >>>> >> > > >>>> >> > > Best, >>>> >> > > Ron >>>> >> > > >>>> >> > > xiangyu feng <xiangyu...@gmail.com> 于2025年2月13日周四 15:57写道: >>>> >> > > >>>> >> > > > Hi Ron, >>>> >> > > > >>>> >> > > > Thx for quick response. >>>> >> > > > >>>> >> > > > - should the default value be true for the newly introduced >>>> option >>>> >> > > > `table.optimizer.reuse-sink-enabled`? >>>> >> > > > >>>> >> > > > I prefer to set the default value of this option as'false in >>>> the >>>> >> first >>>> >> > > > place. Setting as true might introduce unexpected behavior for >>>> >> users >>>> >> > > when >>>> >> > > > operating existing jobs. Maybe we should introduce this >>>> feature at >>>> >> > first >>>> >> > > > and discuss enabling this feature as default in a separated >>>> thread. >>>> >> > WDYT? >>>> >> > > > >>>> >> > > > - have you considered the technical implementation options and >>>> are >>>> >> they >>>> >> > > > feasible? >>>> >> > > > >>>> >> > > > Yes, we have already implemented the POC internally. It works >>>> well. >>>> >> > > > >>>> >> > > > Looking forward for your feedback. >>>> >> > > > >>>> >> > > > Best, >>>> >> > > > Xiangyu >>>> >> > > > >>>> >> > > > Ron Liu <ron9....@gmail.com> 于2025年2月13日周四 14:55写道: >>>> >> > > > >>>> >> > > > > Hi, Xiangyu >>>> >> > > > > >>>> >> > > > > Thank you for proposing this FLIP, it's great work and looks >>>> very >>>> >> > > useful >>>> >> > > > > for users. >>>> >> > > > > >>>> >> > > > > I have the following two questions regarding the content of >>>> the >>>> >> FLIP: >>>> >> > > > > 1. Since sink reuse is very useful, should the default value >>>> be >>>> >> true >>>> >> > > for >>>> >> > > > > the newly introduced option >>>> `table.optimizer.reuse-sink-enabled`, >>>> >> and >>>> >> > > > > should the engine enable this optimization by default. >>>> Currently >>>> >> for >>>> >> > > > source >>>> >> > > > > reuse, the default value of >>>> >> > `sql.optimizer.reuse.table-source.enabled` >>>> >> > > > > option is also true, which does not require user access by >>>> >> default, >>>> >> > so >>>> >> > > I >>>> >> > > > > think the engine should turn on Sink reuse optimization by >>>> >> default. >>>> >> > > > > 2. Regarding Sink Digest, you mentioned disregarding the sink >>>> >> target >>>> >> > > > > column, which I think is a very good suggestion, and very >>>> useful >>>> >> if >>>> >> > it >>>> >> > > > can >>>> >> > > > > be done. I have a question: have you considered the technical >>>> >> > > > > implementation options and are they feasible? >>>> >> > > > > >>>> >> > > > > Best, >>>> >> > > > > Ron >>>> >> > > > > >>>> >> > > > > xiangyu feng <xiangyu...@gmail.com> 于2025年2月13日周四 12:56写道: >>>> >> > > > > >>>> >> > > > > > Hi all, >>>> >> > > > > > >>>> >> > > > > > Thank you all for the comments. >>>> >> > > > > > >>>> >> > > > > > If there is no further comment, I will open the voting >>>> thread >>>> >> in 3 >>>> >> > > > days. >>>> >> > > > > > >>>> >> > > > > > Regards, >>>> >> > > > > > Xiangyu >>>> >> > > > > > >>>> >> > > > > > xiangyu feng <xiangyu...@gmail.com> 于2025年2月11日周二 14:17写道: >>>> >> > > > > > >>>> >> > > > > > > Link for Paimon LocalMerge Operator[1] >>>> >> > > > > > > >>>> >> > > > > > > [1] >>>> >> > > > > > > >>>> >> > > > > > >>>> >> > > > > >>>> >> > > > >>>> >> > > >>>> >> > >>>> >> >>>> https://paimon.apache.org/docs/master/maintenance/write-performance/#local-merging >>>> >> > > > > > > >>>> >> > > > > > > xiangyu feng <xiangyu...@gmail.com> 于2025年2月11日周二 >>>> 14:03写道: >>>> >> > > > > > > >>>> >> > > > > > >> Follow the above, >>>> >> > > > > > >> >>>> >> > > > > > >> "And for SinkWriter, the data structure to be processed >>>> >> should >>>> >> > be >>>> >> > > > > > fixed." >>>> >> > > > > > >> >>>> >> > > > > > >> I'm not very sure why the data structure of SinkWriter >>>> >> should be >>>> >> > > > > fixed. >>>> >> > > > > > >> Can you elaborate the scenario here? >>>> >> > > > > > >> >>>> >> > > > > > >> "Is there a node or an operator to fill in the >>>> inconsistent >>>> >> > field >>>> >> > > > of >>>> >> > > > > > >> Rowdata that passed from different Sources?" >>>> >> > > > > > >> >>>> >> > > > > > >> By `filling in the inconsistent field from different >>>> >> sources`, >>>> >> > do >>>> >> > > > you >>>> >> > > > > > >> refer to implementations like the LocalMerge Operator >>>> [1] for >>>> >> > > > Paimon? >>>> >> > > > > > IMHO, >>>> >> > > > > > >> this should not be included in the Sink Reuse. The >>>> merging >>>> >> > > behavior >>>> >> > > > of >>>> >> > > > > > >> multiple sources should be considered inside of the >>>> sink. >>>> >> > > > > > >> >>>> >> > > > > > >> Regards, >>>> >> > > > > > >> Xiangyu Feng >>>> >> > > > > > >> >>>> >> > > > > > >> xiangyu feng <xiangyu...@gmail.com> 于2025年2月11日周二 >>>> 13:46写道: >>>> >> > > > > > >> >>>> >> > > > > > >>> Hi Yanquan, >>>> >> > > > > > >>> >>>> >> > > > > > >>> Thx for reply. IIUC, the schema of CatalogTable should >>>> >> contain >>>> >> > > all >>>> >> > > > > > >>> target columns for sources. If not, a SQL validation >>>> >> exception >>>> >> > > > should >>>> >> > > > > > be >>>> >> > > > > > >>> raised for planner. >>>> >> > > > > > >>> >>>> >> > > > > > >>> Regards, >>>> >> > > > > > >>> Xiangyu Feng >>>> >> > > > > > >>> >>>> >> > > > > > >>> >>>> >> > > > > > >>> >>>> >> > > > > > >>> Yanquan Lv <decq12y...@gmail.com> 于2025年2月10日周一 >>>> 16:25写道: >>>> >> > > > > > >>> >>>> >> > > > > > >>>> Hi, Xiangyu. Thanks for driving this. >>>> >> > > > > > >>>> >>>> >> > > > > > >>>> I have a question to confirm: >>>> >> > > > > > >>>> Considering the case that different Sources use >>>> different >>>> >> > > > > columns[1], >>>> >> > > > > > >>>> will the Schema of CatalogTable[2] contain all target >>>> >> columns >>>> >> > > for >>>> >> > > > > > Sources? >>>> >> > > > > > >>>> And for SinkWriter, the data structure to be processed >>>> >> should >>>> >> > be >>>> >> > > > > > fixed. >>>> >> > > > > > >>>> Is there a node or an operator to fill in the >>>> inconsistent >>>> >> > field >>>> >> > > > of >>>> >> > > > > > Rowdata >>>> >> > > > > > >>>> that passed from different Sources? >>>> >> > > > > > >>>> >>>> >> > > > > > >>>> [1] >>>> >> > > > > > >>>> >>>> >> > > > > > >>>> >> > > > > >>>> >> > > > >>>> >> > > >>>> >> > >>>> >> >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-506%3A+Support+Reuse+Multiple+Table+Sinks+in+Planner >>>> >> > > > > > >>>> [2] >>>> >> > > > > > >>>> >>>> >> > > > > > >>>> >> > > > > >>>> >> > > > >>>> >> > > >>>> >> > >>>> >> >>>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sourcessinks/#planning >>>> >> > > > > > >>>> >>>> >> > > > > > >>>> >>>> >> > > > > > >>>> >>>> >> > > > > > >>>> > 2025年2月6日 17:06,xiangyu feng <xiangyu...@gmail.com> >>>> 写道: >>>> >> > > > > > >>>> > >>>> >> > > > > > >>>> > Hi devs, >>>> >> > > > > > >>>> > >>>> >> > > > > > >>>> > I'm opening this thread to discuss FLIP-506: Support >>>> >> Reuse >>>> >> > > > > Multiple >>>> >> > > > > > >>>> Table >>>> >> > > > > > >>>> > Sinks in Planner[1]. >>>> >> > > > > > >>>> > >>>> >> > > > > > >>>> > Currently if users want to partial-update a >>>> downstream >>>> >> table >>>> >> > > > from >>>> >> > > > > > >>>> multiple >>>> >> > > > > > >>>> > source tables in one datastream, they would have to >>>> >> manually >>>> >> > > > union >>>> >> > > > > > all >>>> >> > > > > > >>>> > source tables and add lots of "cast(null as string) >>>> as >>>> >> xxx" >>>> >> > in >>>> >> > > > > Flink >>>> >> > > > > > >>>> SQL. >>>> >> > > > > > >>>> > This will make the SQL here hard to use and >>>> maintain. >>>> >> > > > > > >>>> > >>>> >> > > > > > >>>> > After discussing with Weijie Guo, we think that by >>>> >> > supporting >>>> >> > > > > reuse >>>> >> > > > > > >>>> sink >>>> >> > > > > > >>>> > nodes in planner, the usability can be greatly >>>> improved >>>> >> in >>>> >> > > this >>>> >> > > > > > case. >>>> >> > > > > > >>>> > >>>> >> > > > > > >>>> > Therefore, we propose to add a new option >>>> >> > > > > > >>>> > *`table.optimizer.reuse-sink-enabled`* here to >>>> support >>>> >> this >>>> >> > > > > feature. >>>> >> > > > > > >>>> More >>>> >> > > > > > >>>> > details can be found in the FLIP. >>>> >> > > > > > >>>> > >>>> >> > > > > > >>>> > Looking forward to your feedback, thanks. >>>> >> > > > > > >>>> > >>>> >> > > > > > >>>> > [1] >>>> >> > > > > > >>>> > >>>> >> > > > > > >>>> >>>> >> > > > > > >>>> >> > > > > >>>> >> > > > >>>> >> > > >>>> >> > >>>> >> >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-506%3A+Support+Reuse+Multiple+Table+Sinks+in+Planner >>>> >> > > > > > >>>> > >>>> >> > > > > > >>>> > Best regards, >>>> >> > > > > > >>>> > Xiangyu Feng >>>> >> > > > > > >>>> >>>> >> > > > > > >>>> >>>> >> > > > > > >>>> >> > > > > >>>> >> > > > >>>> >> > > >>>> >> > >>>> >> >>>> > >>>> >>>