Hi all,

Thank you all for the comments.

If there is no further comment, I will open the voting thread in 3 days.

Regards,
Xiangyu


Ron Liu <ron9....@gmail.com>于2025年2月19日 周三17:58写道:

> Hi, Xiangyu
>
> Thaks for updates, LGTM
>
> Best,
> Ron
>
> xiangyu feng <xiangyu...@gmail.com> 于2025年2月19日周三 17:13写道:
>
>> Hi Ron,
>>
>> Thx for ur advice.  I've made the changes to current FLIP[1] including
>> renaming the interface and remove the default implementation. As we have
>> discussed, the target columns will be compared in sink reuse if the sink
>> has not implemented the `SupportsTargetColumnWriting` ability. This will
>> make sure the sink reuse feature can still be safely enabled by default.
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-506%3A+Support+Reuse+Multiple+Table+Sinks+in+Planner
>>
>> Best,
>> Xiangyu Feng
>>
>> Ron Liu <ron9....@gmail.com> 于2025年2月19日周三 10:25写道:
>>
>>> Hi Xiangyu,
>>>
>>> Thanks for your reply, the updates LGTM  overall.
>>>
>>> 1. Regarding the naming of the interface, what do you think about
>>> calling it SupportsTargetColumnWriting? Here I would like to emphasize the
>>> support for partial column writing, and I personally think the naming can
>>> be aligned with SupportsWritingMetadata.
>>>
>>> 2. Regarding the interface methods, is it necessary to provide a default
>>> implementation, do most of the stores support partial column writing?
>>>
>>>
>>> Best,
>>> Ron
>>>
>>> Cong Cheng <congchengch...@gmail.com> 于2025年2月18日周二 16:12写道:
>>>
>>>> Hi Xiangyu,
>>>>
>>>> Introduce a new sink ability interface named
>>>> `SupportsTargetColumnUpdate`,
>>>> > this interface will tell the planner if the sink has considered the
>>>> target
>>>> > columns information in its implementation;
>>>>
>>>>
>>>> I think it makes a lot of sense, +1 for this ability.
>>>>
>>>> Sorry to all that I sended the draft of the content twice, something
>>>> wrong with the enter of my keyboard.
>>>>
>>>> Best,
>>>> Cong Cheng
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> xiangyu feng <xiangyu...@gmail.com> 于2025年2月18日周二 15:06写道:
>>>>
>>>> > Hi Kevin,
>>>> >
>>>> > Thx for ur valuable suggestion. I've made a few changes to current
>>>> FLIP[1].
>>>> >
>>>> > 1, Introduce a new sink ability interface named
>>>> > `SupportsTargetColumnUpdate`, this interface will tell the planner if
>>>> the
>>>> > sink has considered the target columns information in its
>>>> implementation;
>>>> >
>>>> > 2, This ability will return true by default for safety consideration;
>>>> >
>>>> > 3, When generating node digest for sink reuse, the digest will only
>>>> ignore
>>>> > the target column infos when this ability return false. This will
>>>> require
>>>> > extra work for specific sink.
>>>> >
>>>> > By applying these changes, we can safely enable the sink reuse
>>>> feature by
>>>> > default even for sinks like JDBC . And for sinks like Paimon, we can
>>>> also
>>>> > reuse the sink node across multiple partial-update streams with
>>>> different
>>>> > target columns by revising paimon sink to implement this interface.
>>>> >
>>>> > Glad to hear you back for these updates.
>>>> >
>>>> > [1]
>>>> >
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-506%3A+Support+Reuse+Multiple+Table+Sinks+in+Planner
>>>> >
>>>> >
>>>> >
>>>> > Kevin Cheng <congchengch...@gmail.com> 于2025年2月14日周五 16:13写道:
>>>> >
>>>> >> Hi Xiangyu and Ron,
>>>> >>
>>>> >> I agree that sink reuse can be enabled by default from Flink Planner
>>>> >> perspective. But the planner should be informed by Sink Connector
>>>> that
>>>> >> whether the planner can reuse different sink with different target
>>>> >> columns.
>>>> >>
>>>> >> Take JBDC sink as an example, under partial update circumstances,
>>>> the JDBC
>>>> >> needs to know the target sink or update columns of every record.
>>>> However,
>>>> >> the target columns info is discarded in current FLIP design.
>>>> >>
>>>> >> Best,
>>>> >> Xiangyu
>>>> >>
>>>> >> xiangyu feng <xiangyu...@gmail.com> 于2025年2月14日周五 13:51写道:
>>>> >>
>>>> >> > Hi Ron,
>>>> >> >
>>>> >> > After second thought, taking sink reuse as a long awaited feature
>>>> >> > with significant benefits for users, I agree to enable this in the
>>>> first
>>>> >> > place.  Similar features like
>>>> `table.optimizer.reuse-sub-plan-enabled`
>>>> >> and
>>>> >> > `table.optimizer.reuse-source-enabled` are also enabled by
>>>> default. From
>>>> >> > this point of view, sink reuse should be the same.
>>>> >> >
>>>> >> > The Flip[1] has been revised accordingly. Thx for suggestion.
>>>> >> >
>>>> >> > [1]
>>>> >> >
>>>> >> >
>>>> >>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-506%3A+Support+Reuse+Multiple+Table+Sinks+in+Planner
>>>> >> >
>>>> >> > Regards,
>>>> >> > Xiangyu
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > Ron Liu <ron9....@gmail.com> 于2025年2月14日周五 12:10写道:
>>>> >> >
>>>> >> > > Hi, Xiangyu
>>>> >> > >
>>>> >> > > >>> I prefer to set the default value of this option as'false in
>>>> the
>>>> >> > first
>>>> >> > > place.  Setting as true might introduce unexpected behavior for
>>>> users
>>>> >> > when
>>>> >> > > operating existing jobs. Maybe we should introduce this feature
>>>> at
>>>> >> first
>>>> >> > > and discuss enabling this feature as default in a separated
>>>> thread.
>>>> >> WDYT?
>>>> >> > >
>>>> >> > > 1. What unexpected behaviors do you think this might introduce?
>>>> For
>>>> >> Sink
>>>> >> > > nodes, which are generally stateless, I intuitively understand
>>>> that no
>>>> >> > > state compatibility issues will be introduced after Sink reuse.
>>>> >> > >
>>>> >> > > 2. Since Sink reuse benefits users, why not enable this feature
>>>> by
>>>> >> > default
>>>> >> > > on the first day it is introduced? If your concern is potential
>>>> >> unhandled
>>>> >> > > corner cases in the implementation, I consider those to be bugs.
>>>> We
>>>> >> > should
>>>> >> > > prioritize fixing them rather than blocking the default
>>>> enablement of
>>>> >> > this
>>>> >> > > optimization.
>>>> >> > >
>>>> >> > > 3. If we don't enable it by default now, when should we? What
>>>> specific
>>>> >> > > milestones or actions are needed during the waiting period?  Your
>>>> >> > concerns
>>>> >> > > about unintended behaviors would still exist even if we enable it
>>>> >> later.
>>>> >> > > Why delay resolving this in a separate discussion instead of
>>>> >> finalizing
>>>> >> > it
>>>> >> > > here?
>>>> >> > >
>>>> >> > > 4. From our internal practice, users still want to enjoy the
>>>> benefits
>>>> >> of
>>>> >> > > this feature by default.
>>>> >> > >
>>>> >> > >
>>>> >> > > Best,
>>>> >> > > Ron
>>>> >> > >
>>>> >> > > xiangyu feng <xiangyu...@gmail.com> 于2025年2月13日周四 15:57写道:
>>>> >> > >
>>>> >> > > >  Hi Ron,
>>>> >> > > >
>>>> >> > > > Thx for quick response.
>>>> >> > > >
>>>> >> > > > - should the default value be true for the newly introduced
>>>> option
>>>> >> > > > `table.optimizer.reuse-sink-enabled`?
>>>> >> > > >
>>>> >> > > > I prefer to set the default value of this option as'false in
>>>> the
>>>> >> first
>>>> >> > > > place.  Setting as true might introduce unexpected behavior for
>>>> >> users
>>>> >> > > when
>>>> >> > > > operating existing jobs. Maybe we should introduce this
>>>> feature at
>>>> >> > first
>>>> >> > > > and discuss enabling this feature as default in a separated
>>>> thread.
>>>> >> > WDYT?
>>>> >> > > >
>>>> >> > > > - have you considered the technical implementation options and
>>>> are
>>>> >> they
>>>> >> > > > feasible?
>>>> >> > > >
>>>> >> > > > Yes, we have already implemented the POC internally. It works
>>>> well.
>>>> >> > > >
>>>> >> > > > Looking forward for your feedback.
>>>> >> > > >
>>>> >> > > > Best,
>>>> >> > > > Xiangyu
>>>> >> > > >
>>>> >> > > > Ron Liu <ron9....@gmail.com> 于2025年2月13日周四 14:55写道:
>>>> >> > > >
>>>> >> > > > > Hi, Xiangyu
>>>> >> > > > >
>>>> >> > > > > Thank you for proposing this FLIP, it's great work and looks
>>>> very
>>>> >> > > useful
>>>> >> > > > > for users.
>>>> >> > > > >
>>>> >> > > > > I have the following two questions regarding the content of
>>>> the
>>>> >> FLIP:
>>>> >> > > > > 1. Since sink reuse is very useful, should the default value
>>>> be
>>>> >> true
>>>> >> > > for
>>>> >> > > > > the newly introduced option
>>>> `table.optimizer.reuse-sink-enabled`,
>>>> >> and
>>>> >> > > > > should the engine enable this optimization by default.
>>>> Currently
>>>> >> for
>>>> >> > > > source
>>>> >> > > > > reuse, the default value of
>>>> >> > `sql.optimizer.reuse.table-source.enabled`
>>>> >> > > > > option is also true, which does not require user access by
>>>> >> default,
>>>> >> > so
>>>> >> > > I
>>>> >> > > > > think the engine should turn on Sink reuse optimization by
>>>> >> default.
>>>> >> > > > > 2. Regarding Sink Digest, you mentioned disregarding the sink
>>>> >> target
>>>> >> > > > > column, which I think is a very good suggestion, and very
>>>> useful
>>>> >> if
>>>> >> > it
>>>> >> > > > can
>>>> >> > > > > be done. I have a question: have you considered the technical
>>>> >> > > > > implementation options and are they feasible?
>>>> >> > > > >
>>>> >> > > > > Best,
>>>> >> > > > > Ron
>>>> >> > > > >
>>>> >> > > > > xiangyu feng <xiangyu...@gmail.com> 于2025年2月13日周四 12:56写道:
>>>> >> > > > >
>>>> >> > > > > > Hi all,
>>>> >> > > > > >
>>>> >> > > > > > Thank you all for the comments.
>>>> >> > > > > >
>>>> >> > > > > > If there is no further comment, I will open the voting
>>>> thread
>>>> >> in 3
>>>> >> > > > days.
>>>> >> > > > > >
>>>> >> > > > > > Regards,
>>>> >> > > > > > Xiangyu
>>>> >> > > > > >
>>>> >> > > > > > xiangyu feng <xiangyu...@gmail.com> 于2025年2月11日周二 14:17写道:
>>>> >> > > > > >
>>>> >> > > > > > > Link for Paimon LocalMerge Operator[1]
>>>> >> > > > > > >
>>>> >> > > > > > > [1]
>>>> >> > > > > > >
>>>> >> > > > > >
>>>> >> > > > >
>>>> >> > > >
>>>> >> > >
>>>> >> >
>>>> >>
>>>> https://paimon.apache.org/docs/master/maintenance/write-performance/#local-merging
>>>> >> > > > > > >
>>>> >> > > > > > > xiangyu feng <xiangyu...@gmail.com> 于2025年2月11日周二
>>>> 14:03写道:
>>>> >> > > > > > >
>>>> >> > > > > > >> Follow the above,
>>>> >> > > > > > >>
>>>> >> > > > > > >> "And for SinkWriter, the data structure to be processed
>>>> >> should
>>>> >> > be
>>>> >> > > > > > fixed."
>>>> >> > > > > > >>
>>>> >> > > > > > >> I'm not very sure why the data structure of SinkWriter
>>>> >> should be
>>>> >> > > > > fixed.
>>>> >> > > > > > >> Can you elaborate the scenario here?
>>>> >> > > > > > >>
>>>> >> > > > > > >>  "Is there a node or an operator to fill in the
>>>> inconsistent
>>>> >> > field
>>>> >> > > > of
>>>> >> > > > > > >> Rowdata that passed from different Sources?"
>>>> >> > > > > > >>
>>>> >> > > > > > >> By `filling in the inconsistent field from different
>>>> >> sources`,
>>>> >> > do
>>>> >> > > > you
>>>> >> > > > > > >> refer to implementations like the LocalMerge Operator
>>>> [1] for
>>>> >> > > > Paimon?
>>>> >> > > > > > IMHO,
>>>> >> > > > > > >> this should not be included in the Sink Reuse. The
>>>> merging
>>>> >> > > behavior
>>>> >> > > > of
>>>> >> > > > > > >> multiple sources should be considered inside of the
>>>> sink.
>>>> >> > > > > > >>
>>>> >> > > > > > >> Regards,
>>>> >> > > > > > >> Xiangyu Feng
>>>> >> > > > > > >>
>>>> >> > > > > > >> xiangyu feng <xiangyu...@gmail.com> 于2025年2月11日周二
>>>> 13:46写道:
>>>> >> > > > > > >>
>>>> >> > > > > > >>> Hi Yanquan,
>>>> >> > > > > > >>>
>>>> >> > > > > > >>> Thx for reply. IIUC, the schema of CatalogTable should
>>>> >> contain
>>>> >> > > all
>>>> >> > > > > > >>> target columns for sources. If not, a SQL validation
>>>> >> exception
>>>> >> > > > should
>>>> >> > > > > > be
>>>> >> > > > > > >>> raised for planner.
>>>> >> > > > > > >>>
>>>> >> > > > > > >>> Regards,
>>>> >> > > > > > >>> Xiangyu Feng
>>>> >> > > > > > >>>
>>>> >> > > > > > >>>
>>>> >> > > > > > >>>
>>>> >> > > > > > >>> Yanquan Lv <decq12y...@gmail.com> 于2025年2月10日周一
>>>> 16:25写道:
>>>> >> > > > > > >>>
>>>> >> > > > > > >>>> Hi, Xiangyu. Thanks for driving this.
>>>> >> > > > > > >>>>
>>>> >> > > > > > >>>> I have a question to confirm:
>>>> >> > > > > > >>>> Considering the case that different Sources use
>>>> different
>>>> >> > > > > columns[1],
>>>> >> > > > > > >>>> will the Schema of CatalogTable[2] contain all target
>>>> >> columns
>>>> >> > > for
>>>> >> > > > > > Sources?
>>>> >> > > > > > >>>> And for SinkWriter, the data structure to be processed
>>>> >> should
>>>> >> > be
>>>> >> > > > > > fixed.
>>>> >> > > > > > >>>> Is there a node or an operator to fill in the
>>>> inconsistent
>>>> >> > field
>>>> >> > > > of
>>>> >> > > > > > Rowdata
>>>> >> > > > > > >>>> that passed from different Sources?
>>>> >> > > > > > >>>>
>>>> >> > > > > > >>>> [1]
>>>> >> > > > > > >>>>
>>>> >> > > > > >
>>>> >> > > > >
>>>> >> > > >
>>>> >> > >
>>>> >> >
>>>> >>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-506%3A+Support+Reuse+Multiple+Table+Sinks+in+Planner
>>>> >> > > > > > >>>> [2]
>>>> >> > > > > > >>>>
>>>> >> > > > > >
>>>> >> > > > >
>>>> >> > > >
>>>> >> > >
>>>> >> >
>>>> >>
>>>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sourcessinks/#planning
>>>> >> > > > > > >>>>
>>>> >> > > > > > >>>>
>>>> >> > > > > > >>>>
>>>> >> > > > > > >>>> > 2025年2月6日 17:06,xiangyu feng <xiangyu...@gmail.com>
>>>> 写道:
>>>> >> > > > > > >>>> >
>>>> >> > > > > > >>>> > Hi devs,
>>>> >> > > > > > >>>> >
>>>> >> > > > > > >>>> > I'm opening this thread to discuss FLIP-506: Support
>>>> >> Reuse
>>>> >> > > > > Multiple
>>>> >> > > > > > >>>> Table
>>>> >> > > > > > >>>> > Sinks in Planner[1].
>>>> >> > > > > > >>>> >
>>>> >> > > > > > >>>> > Currently if users want to partial-update a
>>>> downstream
>>>> >> table
>>>> >> > > > from
>>>> >> > > > > > >>>> multiple
>>>> >> > > > > > >>>> > source tables in one datastream, they would have to
>>>> >> manually
>>>> >> > > > union
>>>> >> > > > > > all
>>>> >> > > > > > >>>> > source tables and add lots of "cast(null as string)
>>>> as
>>>> >> xxx"
>>>> >> > in
>>>> >> > > > > Flink
>>>> >> > > > > > >>>> SQL.
>>>> >> > > > > > >>>> > This will make the SQL here hard to use and
>>>> maintain.
>>>> >> > > > > > >>>> >
>>>> >> > > > > > >>>> > After discussing with Weijie Guo, we think that by
>>>> >> > supporting
>>>> >> > > > > reuse
>>>> >> > > > > > >>>> sink
>>>> >> > > > > > >>>> > nodes in planner, the usability can be greatly
>>>> improved
>>>> >> in
>>>> >> > > this
>>>> >> > > > > > case.
>>>> >> > > > > > >>>> >
>>>> >> > > > > > >>>> > Therefore, we propose to add a new option
>>>> >> > > > > > >>>> > *`table.optimizer.reuse-sink-enabled`* here to
>>>> support
>>>> >> this
>>>> >> > > > > feature.
>>>> >> > > > > > >>>> More
>>>> >> > > > > > >>>> > details can be found in the FLIP.
>>>> >> > > > > > >>>> >
>>>> >> > > > > > >>>> > Looking forward to your feedback, thanks.
>>>> >> > > > > > >>>> >
>>>> >> > > > > > >>>> > [1]
>>>> >> > > > > > >>>> >
>>>> >> > > > > > >>>>
>>>> >> > > > > >
>>>> >> > > > >
>>>> >> > > >
>>>> >> > >
>>>> >> >
>>>> >>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-506%3A+Support+Reuse+Multiple+Table+Sinks+in+Planner
>>>> >> > > > > > >>>> >
>>>> >> > > > > > >>>> > Best regards,
>>>> >> > > > > > >>>> > Xiangyu Feng
>>>> >> > > > > > >>>>
>>>> >> > > > > > >>>>
>>>> >> > > > > >
>>>> >> > > > >
>>>> >> > > >
>>>> >> > >
>>>> >> >
>>>> >>
>>>> >
>>>>
>>>

Reply via email to