Re: [DISCUSS] FLIP-506: Support Reuse Multiple Table Sinks in Planner

xiangyu feng Thu, 13 Feb 2025 21:51:24 -0800

Hi Ron,

After second thought, taking sink reuse as a long awaited feature
with significant benefits for users, I agree to enable this in the first
place.  Similar features like `table.optimizer.reuse-sub-plan-enabled` and
`table.optimizer.reuse-source-enabled` are also enabled by default. From
this point of view, sink reuse should be the same.


The Flip[1] has been revised accordingly. Thx for suggestion.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-506%3A+Support+Reuse+Multiple+Table+Sinks+in+Planner

Regards,
Xiangyu




Ron Liu <[email protected]> 于2025年2月14日周五 12:10写道：

> Hi, Xiangyu
>
> >>> I prefer to set the default value of this option as'false in the first
> place.  Setting as true might introduce unexpected behavior for users when
> operating existing jobs. Maybe we should introduce this feature at first
> and discuss enabling this feature as default in a separated thread. WDYT?
>
> 1. What unexpected behaviors do you think this might introduce?  For Sink
> nodes, which are generally stateless, I intuitively understand that no
> state compatibility issues will be introduced after Sink reuse.
>
> 2. Since Sink reuse benefits users, why not enable this feature by default
> on the first day it is introduced? If your concern is potential unhandled
> corner cases in the implementation, I consider those to be bugs. We should
> prioritize fixing them rather than blocking the default enablement of this
> optimization.
>
> 3. If we don't enable it by default now, when should we? What specific
> milestones or actions are needed during the waiting period?  Your concerns
> about unintended behaviors would still exist even if we enable it later.
> Why delay resolving this in a separate discussion instead of finalizing it
> here?
>
> 4. From our internal practice, users still want to enjoy the benefits of
> this feature by default.
>
>
> Best,
> Ron
>
> xiangyu feng <[email protected]> 于2025年2月13日周四 15:57写道：
>
> >  Hi Ron,
> >
> > Thx for quick response.
> >
> > - should the default value be true for the newly introduced option
> > `table.optimizer.reuse-sink-enabled`?
> >
> > I prefer to set the default value of this option as'false in the first
> > place.  Setting as true might introduce unexpected behavior for users
> when
> > operating existing jobs. Maybe we should introduce this feature at first
> > and discuss enabling this feature as default in a separated thread. WDYT?
> >
> > - have you considered the technical implementation options and are they
> > feasible?
> >
> > Yes, we have already implemented the POC internally. It works well.
> >
> > Looking forward for your feedback.
> >
> > Best,
> > Xiangyu
> >
> > Ron Liu <[email protected]> 于2025年2月13日周四 14:55写道：
> >
> > > Hi, Xiangyu
> > >
> > > Thank you for proposing this FLIP, it's great work and looks very
> useful
> > > for users.
> > >
> > > I have the following two questions regarding the content of the FLIP:
> > > 1. Since sink reuse is very useful, should the default value be true
> for
> > > the newly introduced option `table.optimizer.reuse-sink-enabled`, and
> > > should the engine enable this optimization by default. Currently for
> > source
> > > reuse, the default value of  `sql.optimizer.reuse.table-source.enabled`
> > > option is also true, which does not require user access by default, so
> I
> > > think the engine should turn on Sink reuse optimization by default.
> > > 2. Regarding Sink Digest, you mentioned disregarding the sink target
> > > column, which I think is a very good suggestion, and very useful if it
> > can
> > > be done. I have a question: have you considered the technical
> > > implementation options and are they feasible?
> > >
> > > Best,
> > > Ron
> > >
> > > xiangyu feng <[email protected]> 于2025年2月13日周四 12:56写道：
> > >
> > > > Hi all,
> > > >
> > > > Thank you all for the comments.
> > > >
> > > > If there is no further comment, I will open the voting thread in 3
> > days.
> > > >
> > > > Regards,
> > > > Xiangyu
> > > >
> > > > xiangyu feng <[email protected]> 于2025年2月11日周二 14:17写道：
> > > >
> > > > > Link for Paimon LocalMerge Operator[1]
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://paimon.apache.org/docs/master/maintenance/write-performance/#local-merging
> > > > >
> > > > > xiangyu feng <[email protected]> 于2025年2月11日周二 14:03写道：
> > > > >
> > > > >> Follow the above,
> > > > >>
> > > > >> "And for SinkWriter, the data structure to be processed should be
> > > > fixed."
> > > > >>
> > > > >> I'm not very sure why the data structure of SinkWriter should be
> > > fixed.
> > > > >> Can you elaborate the scenario here?
> > > > >>
> > > > >>  "Is there a node or an operator to fill in the inconsistent field
> > of
> > > > >> Rowdata that passed from different Sources?"
> > > > >>
> > > > >> By `filling in the inconsistent field from different sources`, do
> > you
> > > > >> refer to implementations like the LocalMerge Operator [1] for
> > Paimon?
> > > > IMHO,
> > > > >> this should not be included in the Sink Reuse. The merging
> behavior
> > of
> > > > >> multiple sources should be considered inside of the sink.
> > > > >>
> > > > >> Regards,
> > > > >> Xiangyu Feng
> > > > >>
> > > > >> xiangyu feng <[email protected]> 于2025年2月11日周二 13:46写道：
> > > > >>
> > > > >>> Hi Yanquan,
> > > > >>>
> > > > >>> Thx for reply. IIUC, the schema of CatalogTable should contain
> all
> > > > >>> target columns for sources. If not, a SQL validation exception
> > should
> > > > be
> > > > >>> raised for planner.
> > > > >>>
> > > > >>> Regards,
> > > > >>> Xiangyu Feng
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> Yanquan Lv <[email protected]> 于2025年2月10日周一 16:25写道：
> > > > >>>
> > > > >>>> Hi, Xiangyu. Thanks for driving this.
> > > > >>>>
> > > > >>>> I have a question to confirm:
> > > > >>>> Considering the case that different Sources use different
> > > columns[1],
> > > > >>>> will the Schema of CatalogTable[2] contain all target columns
> for
> > > > Sources?
> > > > >>>> And for SinkWriter, the data structure to be processed should be
> > > > fixed.
> > > > >>>> Is there a node or an operator to fill in the inconsistent field
> > of
> > > > Rowdata
> > > > >>>> that passed from different Sources?
> > > > >>>>
> > > > >>>> [1]
> > > > >>>>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-506%3A+Support+Reuse+Multiple+Table+Sinks+in+Planner
> > > > >>>> [2]
> > > > >>>>
> > > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sourcessinks/#planning
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> > 2025年2月6日 17:06，xiangyu feng <[email protected]> 写道：
> > > > >>>> >
> > > > >>>> > Hi devs,
> > > > >>>> >
> > > > >>>> > I'm opening this thread to discuss FLIP-506: Support Reuse
> > > Multiple
> > > > >>>> Table
> > > > >>>> > Sinks in Planner[1].
> > > > >>>> >
> > > > >>>> > Currently if users want to partial-update a downstream table
> > from
> > > > >>>> multiple
> > > > >>>> > source tables in one datastream, they would have to manually
> > union
> > > > all
> > > > >>>> > source tables and add lots of "cast(null as string) as xxx" in
> > > Flink
> > > > >>>> SQL.
> > > > >>>> > This will make the SQL here hard to use and maintain.
> > > > >>>> >
> > > > >>>> > After discussing with Weijie Guo, we think that by supporting
> > > reuse
> > > > >>>> sink
> > > > >>>> > nodes in planner, the usability can be greatly improved in
> this
> > > > case.
> > > > >>>> >
> > > > >>>> > Therefore, we propose to add a new option
> > > > >>>> > *`table.optimizer.reuse-sink-enabled`* here to support this
> > > feature.
> > > > >>>> More
> > > > >>>> > details can be found in the FLIP.
> > > > >>>> >
> > > > >>>> > Looking forward to your feedback, thanks.
> > > > >>>> >
> > > > >>>> > [1]
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-506%3A+Support+Reuse+Multiple+Table+Sinks+in+Planner
> > > > >>>> >
> > > > >>>> > Best regards,
> > > > >>>> > Xiangyu Feng
> > > > >>>>
> > > > >>>>
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-506: Support Reuse Multiple Table Sinks in Planner

Reply via email to