Re: [DISCUSS] FLIP-506: Support Reuse Multiple Table Sinks in Planner

Ron Liu Thu, 13 Feb 2025 20:10:41 -0800

Hi, Xiangyu

>>> I prefer to set the default value of this option as'false in the first
place.  Setting as true might introduce unexpected behavior for users when
operating existing jobs. Maybe we should introduce this feature at first
and discuss enabling this feature as default in a separated thread. WDYT?


1. What unexpected behaviors do you think this might introduce?  For Sink
nodes, which are generally stateless, I intuitively understand that no
state compatibility issues will be introduced after Sink reuse.

2. Since Sink reuse benefits users, why not enable this feature by default
on the first day it is introduced? If your concern is potential unhandled
corner cases in the implementation, I consider those to be bugs. We should
prioritize fixing them rather than blocking the default enablement of this
optimization.

3. If we don't enable it by default now, when should we? What specific
milestones or actions are needed during the waiting period?  Your concerns
about unintended behaviors would still exist even if we enable it later.
Why delay resolving this in a separate discussion instead of finalizing it
here?

4. From our internal practice, users still want to enjoy the benefits of
this feature by default.


Best,
Ron

xiangyu feng <[email protected]> 于2025年2月13日周四 15:57写道：

>  Hi Ron,
>
> Thx for quick response.
>
> - should the default value be true for the newly introduced option
> `table.optimizer.reuse-sink-enabled`?
>
> I prefer to set the default value of this option as'false in the first
> place.  Setting as true might introduce unexpected behavior for users when
> operating existing jobs. Maybe we should introduce this feature at first
> and discuss enabling this feature as default in a separated thread. WDYT?
>
> - have you considered the technical implementation options and are they
> feasible?
>
> Yes, we have already implemented the POC internally. It works well.
>
> Looking forward for your feedback.
>
> Best,
> Xiangyu
>
> Ron Liu <[email protected]> 于2025年2月13日周四 14:55写道：
>
> > Hi, Xiangyu
> >
> > Thank you for proposing this FLIP, it's great work and looks very useful
> > for users.
> >
> > I have the following two questions regarding the content of the FLIP:
> > 1. Since sink reuse is very useful, should the default value be true for
> > the newly introduced option `table.optimizer.reuse-sink-enabled`, and
> > should the engine enable this optimization by default. Currently for
> source
> > reuse, the default value of  `sql.optimizer.reuse.table-source.enabled`
> > option is also true, which does not require user access by default, so I
> > think the engine should turn on Sink reuse optimization by default.
> > 2. Regarding Sink Digest, you mentioned disregarding the sink target
> > column, which I think is a very good suggestion, and very useful if it
> can
> > be done. I have a question: have you considered the technical
> > implementation options and are they feasible?
> >
> > Best,
> > Ron
> >
> > xiangyu feng <[email protected]> 于2025年2月13日周四 12:56写道：
> >
> > > Hi all,
> > >
> > > Thank you all for the comments.
> > >
> > > If there is no further comment, I will open the voting thread in 3
> days.
> > >
> > > Regards,
> > > Xiangyu
> > >
> > > xiangyu feng <[email protected]> 于2025年2月11日周二 14:17写道：
> > >
> > > > Link for Paimon LocalMerge Operator[1]
> > > >
> > > > [1]
> > > >
> > >
> >
> https://paimon.apache.org/docs/master/maintenance/write-performance/#local-merging
> > > >
> > > > xiangyu feng <[email protected]> 于2025年2月11日周二 14:03写道：
> > > >
> > > >> Follow the above,
> > > >>
> > > >> "And for SinkWriter, the data structure to be processed should be
> > > fixed."
> > > >>
> > > >> I'm not very sure why the data structure of SinkWriter should be
> > fixed.
> > > >> Can you elaborate the scenario here?
> > > >>
> > > >>  "Is there a node or an operator to fill in the inconsistent field
> of
> > > >> Rowdata that passed from different Sources?"
> > > >>
> > > >> By `filling in the inconsistent field from different sources`, do
> you
> > > >> refer to implementations like the LocalMerge Operator [1] for
> Paimon?
> > > IMHO,
> > > >> this should not be included in the Sink Reuse. The merging behavior
> of
> > > >> multiple sources should be considered inside of the sink.
> > > >>
> > > >> Regards,
> > > >> Xiangyu Feng
> > > >>
> > > >> xiangyu feng <[email protected]> 于2025年2月11日周二 13:46写道：
> > > >>
> > > >>> Hi Yanquan,
> > > >>>
> > > >>> Thx for reply. IIUC, the schema of CatalogTable should contain all
> > > >>> target columns for sources. If not, a SQL validation exception
> should
> > > be
> > > >>> raised for planner.
> > > >>>
> > > >>> Regards,
> > > >>> Xiangyu Feng
> > > >>>
> > > >>>
> > > >>>
> > > >>> Yanquan Lv <[email protected]> 于2025年2月10日周一 16:25写道：
> > > >>>
> > > >>>> Hi, Xiangyu. Thanks for driving this.
> > > >>>>
> > > >>>> I have a question to confirm:
> > > >>>> Considering the case that different Sources use different
> > columns[1],
> > > >>>> will the Schema of CatalogTable[2] contain all target columns for
> > > Sources?
> > > >>>> And for SinkWriter, the data structure to be processed should be
> > > fixed.
> > > >>>> Is there a node or an operator to fill in the inconsistent field
> of
> > > Rowdata
> > > >>>> that passed from different Sources?
> > > >>>>
> > > >>>> [1]
> > > >>>>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-506%3A+Support+Reuse+Multiple+Table+Sinks+in+Planner
> > > >>>> [2]
> > > >>>>
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sourcessinks/#planning
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> > 2025年2月6日 17:06，xiangyu feng <[email protected]> 写道：
> > > >>>> >
> > > >>>> > Hi devs,
> > > >>>> >
> > > >>>> > I'm opening this thread to discuss FLIP-506: Support Reuse
> > Multiple
> > > >>>> Table
> > > >>>> > Sinks in Planner[1].
> > > >>>> >
> > > >>>> > Currently if users want to partial-update a downstream table
> from
> > > >>>> multiple
> > > >>>> > source tables in one datastream, they would have to manually
> union
> > > all
> > > >>>> > source tables and add lots of "cast(null as string) as xxx" in
> > Flink
> > > >>>> SQL.
> > > >>>> > This will make the SQL here hard to use and maintain.
> > > >>>> >
> > > >>>> > After discussing with Weijie Guo, we think that by supporting
> > reuse
> > > >>>> sink
> > > >>>> > nodes in planner, the usability can be greatly improved in this
> > > case.
> > > >>>> >
> > > >>>> > Therefore, we propose to add a new option
> > > >>>> > *`table.optimizer.reuse-sink-enabled`* here to support this
> > feature.
> > > >>>> More
> > > >>>> > details can be found in the FLIP.
> > > >>>> >
> > > >>>> > Looking forward to your feedback, thanks.
> > > >>>> >
> > > >>>> > [1]
> > > >>>> >
> > > >>>>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-506%3A+Support+Reuse+Multiple+Table+Sinks+in+Planner
> > > >>>> >
> > > >>>> > Best regards,
> > > >>>> > Xiangyu Feng
> > > >>>>
> > > >>>>
> > >
> >
>

Re: [DISCUSS] FLIP-506: Support Reuse Multiple Table Sinks in Planner

Reply via email to