Thank you all for the lively discussion!

Agree with Benchao that from a user's (rather than a developer's) point of
view, it's easier to understand that fine-grained options override global
options.

In addition, for the new option 'scan.filter-push-down.enabled', would it
be
better to keep the name consistent with the global option
'table.optimizer.source.predicate-pushdown-enabled' ?
e.g. 'source.predicate-pushdown.enabled' (here, source contains
LookupTableSource and ScanTableSource)

Best,
Lincoln Lee


Benchao Li <libenc...@apache.org> 于2023年10月25日周三 17:11写道:

> Thanks Jiabao for the detailed explanations, that helps a lot, I
> understand your rationale now.
>
> Correct me if I'm wrong. Your perspective is from "developer", which
> means there is an optimizer and connector component, and if we want to
> enable this feature (pushing filters down into connectors), you must
> enable it firstly in optimizer, and only then connector has the chance
> to decide to use it or not.
>
> My perspective is from "user" that (Why a user should care about the
> difference of optimizer/connector) , this is a feature, and has two
> way to control it, one way is to config it job-level, the other one is
> in table properties. What a user expects is that they can control a
> feature in a tiered way, that setting it per job, and then
> fine-grained tune it per table.
>
> This is some kind of similar to other concepts, such as parallelism,
> users can set a job level default parallelism, and then fine-grained
> tune it per operator. There may be more such debate in the future
> e.g., we can have a job level config about adding key-by before lookup
> join, and also a hint/table property way to fine-grained control it
> per lookup operator. Hence we'd better find a unified way for all
> those similar kind of features.
>
> Jiabao Sun <jiabao....@xtransfer.cn.invalid> 于2023年10月25日周三 15:27写道:
> >
> > Thanks Jane for further explanation.
> >
> > These two configurations correspond to different levels.
> "scan.filter-push-down.enabled" does not make
> "table.optimizer.source.predicate" invalid.
> > The planner will still push down predicates to all sources.
> > Whether filter pushdown is allowed or not is determined by the specific
> source's "scan.filter-push-down.enabled" configuration.
> >
> > However, "table.optimizer.source.predicate" does directly affect
> "scan.filter-push-down.enabled”.
> > When the planner disables predicate pushdown, the source-level filter
> pushdown will also not be executed, even if the source allows filter
> pushdown.
> >
> > Whatever, in point 1 and 2, our expectation is consistent.
> > For the 3rd point, I still think that the planner-level configuration
> takes precedence over the source-level configuration.
> > It may seem counterintuitive when we globally disable predicate pushdown
> but allow filter pushdown at the source level.
> >
> > Best,
> > Jiabao
> >
> >
> >
> > > 2023年10月25日 14:35,Jane Chan <qingyue....@gmail.com> 写道:
> > >
> > > Hi Jiabao,
> > >
> > > Thanks for clarifying this. While by "scan.filter-push-down.enabled
> takes a
> > > higher priority" I meant that this value should be respected whenever
> it is
> > > set explicitly.
> > >
> > > The conclusion that
> > >
> > > 2. "table.optimizer.source.predicate" = "true" and
> > >> "scan.filter-push-down.enabled" = "false"
> > >> Allow the planner to perform predicate pushdown, but individual
> sources do
> > >> not enable filter pushdown.
> > >>
> > >
> > > This indicates that the option "scan.filter-push-down.enabled = false"
> for
> > > an individual source connector does indeed override the global-level
> > > planner settings to make a difference. And thus "has a higher
> priority".
> > >
> > > While for
> > >
> > > 3. "table.optimizer.source.predicate" = "false"
> > >> Predicate pushdown is not allowed for the planner.
> > >> Regardless of the value of the "scan.filter-push-down.enabled"
> > >> configuration, filter pushdown is disabled.
> > >> In this scenario, the behavior remains consistent with the old
> version as
> > >> well.
> > >>
> > >
> > > I still think "scan.filter-push-down.enabled" should also be respected
> if
> > > it is enabled for individual connectors. WDYT?
> > >
> > > Best,
> > > Jane
> > >
> > > On Wed, Oct 25, 2023 at 1:27 PM Jiabao Sun <jiabao....@xtransfer.cn
> .invalid>
> > > wrote:
> > >
> > >> Thanks Benchao for the feedback.
> > >>
> > >> For the current proposal, we recommend keeping the default value of
> > >> "table.optimizer.source.predicate" as true,
> > >> and setting the the default value of newly introduced option
> > >> "scan.filter-push-down.enabled" to true as well.
> > >>
> > >> The main purpose of doing this is to maintain consistency with
> previous
> > >> versions, as whether to perform
> > >> filter pushdown in the old version solely depends on the
> > >> "table.optimizer.source.predicate" option.
> > >> That means by default, as long as a TableSource implements the
> > >> SupportsFilterPushDown interface, filter pushdown is allowed.
> > >> And it seems that we don't have much benefit in changing the default
> value
> > >> of "table.optimizer.source.predicate" to false.
> > >>
> > >> Regarding the priority of these two configurations, I believe that
> > >> "table.optimizer.source.predicate"
> > >> takes precedence over "scan.filter-push-down.enabled" and it exhibits
> the
> > >> following behavior.
> > >>
> > >> 1. "table.optimizer.source.predicate" = "true" and
> > >> "scan.filter-push-down.enabled" = "true"
> > >> This is the default behavior, allowing filter pushdown for sources.
> > >>
> > >> 2. "table.optimizer.source.predicate" = "true" and
> > >> "scan.filter-push-down.enabled" = "false"
> > >> Allow the planner to perform predicate pushdown, but individual
> sources do
> > >> not enable filter pushdown.
> > >>
> > >> 3. "table.optimizer.source.predicate" = "false"
> > >> Predicate pushdown is not allowed for the planner.
> > >> Regardless of the value of the "scan.filter-push-down.enabled"
> > >> configuration, filter pushdown is disabled.
> > >> In this scenario, the behavior remains consistent with the old
> version as
> > >> well.
> > >>
> > >>
> > >> From an implementation perspective, setting the priority of
> > >> "scan.filter-push-down.enabled" higher than
> > >> "table.optimizer.source.predicate" is difficult to achieve now.
> > >> Because the PushFilterIntoSourceScanRuleBase at the planner level
> takes
> > >> precedence over the source-level FilterPushDownSpec.
> > >> Only when the PushFilterIntoSourceScanRuleBase is enabled, will the
> > >> Source-level filter pushdown be performed.
> > >>
> > >> Additionally, in my opinion, there doesn't seem to be much benefit in
> > >> setting a higher priority for "scan.filter-push-down.enabled".
> > >> It may instead affect compatibility and increase implementation
> complexity.
> > >>
> > >> WDYT?
> > >>
> > >> Best,
> > >> Jiabao
> > >>
> > >>
> > >>> 2023年10月25日 11:56,Benchao Li <libenc...@apache.org> 写道:
> > >>>
> > >>> I agree with Jane that fine-grained configurations should have higher
> > >>> priority than job level configurations.
> > >>>
> > >>> For current proposal, we can achieve that:
> > >>> - Set "table.optimizer.source.predicate" = "true" to enable by
> > >>> default, and set ""scan.filter-push-down.enabled" = "false" to
> disable
> > >>> it per table source
> > >>> - Set "table.optimizer.source.predicate" = "false" to disable by
> > >>> default, and set ""scan.filter-push-down.enabled" = "true" to enable
> > >>> it per table source
> > >>>
> > >>> Jane Chan <qingyue....@gmail.com> 于2023年10月24日周二 23:55写道:
> > >>>>
> > >>>>>
> > >>>>> I believe that the configuration "table.optimizer.source.predicate"
> > >> has a
> > >>>>> higher priority at the planner level than the configuration at the
> > >> source
> > >>>>> level,
> > >>>>> and it seems easy to implement now.
> > >>>>>
> > >>>>
> > >>>> Correct me if I'm wrong, but I think the fine-grained configuration
> > >>>> "scan.filter-push-down.enabled" should have a higher priority
> because
> > >> the
> > >>>> default value of "table.optimizer.source.predicate" is true. As a
> > >> result,
> > >>>> turning off filter push-down for a specific source will not take
> effect
> > >>>> unless the default value of "table.optimizer.source.predicate" is
> > >> changed
> > >>>> to false, or, alternatively, let users manually set
> > >>>> "table.optimizer.source.predicate" to false first and then
> selectively
> > >>>> enable filter push-down for the desired sources, which is less
> > >> intuitive.
> > >>>> WDYT?
> > >>>>
> > >>>> Best,
> > >>>> Jane
> > >>>>
> > >>>> On Tue, Oct 24, 2023 at 6:05 PM Jiabao Sun <jiabao....@xtransfer.cn
> > >> .invalid>
> > >>>> wrote:
> > >>>>
> > >>>>> Thanks Jane,
> > >>>>>
> > >>>>> I believe that the configuration "table.optimizer.source.predicate"
> > >> has a
> > >>>>> higher priority at the planner level than the configuration at the
> > >> source
> > >>>>> level,
> > >>>>> and it seems easy to implement now.
> > >>>>>
> > >>>>> Best,
> > >>>>> Jiabao
> > >>>>>
> > >>>>>
> > >>>>>> 2023年10月24日 17:36,Jane Chan <qingyue....@gmail.com> 写道:
> > >>>>>>
> > >>>>>> Hi Jiabao,
> > >>>>>>
> > >>>>>> Thanks for driving this discussion. I have a small question that
> will
> > >>>>>> "scan.filter-push-down.enabled" take precedence over
> > >>>>>> "table.optimizer.source.predicate" when the two parameters might
> > >> conflict
> > >>>>>> each other?
> > >>>>>>
> > >>>>>> Best,
> > >>>>>> Jane
> > >>>>>>
> > >>>>>> On Tue, Oct 24, 2023 at 5:05 PM Jiabao Sun <
> jiabao....@xtransfer.cn
> > >>>>> .invalid>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> Thanks Jark,
> > >>>>>>>
> > >>>>>>> If we only add configuration without adding the
> enableFilterPushDown
> > >>>>>>> method in the SupportsFilterPushDown interface,
> > >>>>>>> each connector would have to handle the same logic in the
> > >> applyFilters
> > >>>>>>> method to determine whether filter pushdown is needed.
> > >>>>>>> This would increase complexity and violate the original behavior
> of
> > >> the
> > >>>>>>> applyFilters method.
> > >>>>>>>
> > >>>>>>> On the contrary, we only need to pass the configuration
> parameter in
> > >> the
> > >>>>>>> newly added enableFilterPushDown method
> > >>>>>>> to decide whether to perform predicate pushdown.
> > >>>>>>>
> > >>>>>>> I think this approach would be clearer and simpler.
> > >>>>>>> WDYT?
> > >>>>>>>
> > >>>>>>> Best,
> > >>>>>>> Jiabao
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>> 2023年10月24日 16:58,Jark Wu <imj...@gmail.com> 写道:
> > >>>>>>>>
> > >>>>>>>> Hi JIabao,
> > >>>>>>>>
> > >>>>>>>> I think the current interface can already satisfy your
> requirements.
> > >>>>>>>> The connector can reject all the filters by returning the input
> > >> filters
> > >>>>>>>> as `Result#remainingFilters`.
> > >>>>>>>>
> > >>>>>>>> So maybe we don't need to introduce a new method to disable
> > >>>>>>>> pushdown, but just introduce an option for the specific
> connector.
> > >>>>>>>>
> > >>>>>>>> Best,
> > >>>>>>>> Jark
> > >>>>>>>>
> > >>>>>>>> On Tue, 24 Oct 2023 at 16:38, Leonard Xu <xbjt...@gmail.com>
> wrote:
> > >>>>>>>>
> > >>>>>>>>> Thanks @Jiabao for kicking off this discussion.
> > >>>>>>>>>
> > >>>>>>>>> Could you add a section to explain the difference between
> proposed
> > >>>>>>>>> connector level config `scan.filter-push-down.enabled` and
> existing
> > >>>>>>> query
> > >>>>>>>>> level config
> `table.optimizer.source.predicate-pushdown-enabled` ?
> > >>>>>>>>>
> > >>>>>>>>> Best,
> > >>>>>>>>> Leonard
> > >>>>>>>>>
> > >>>>>>>>>> 2023年10月24日 下午4:18,Jiabao Sun <jiabao....@xtransfer.cn
> .INVALID>
> > >> 写道:
> > >>>>>>>>>>
> > >>>>>>>>>> Hi Devs,
> > >>>>>>>>>>
> > >>>>>>>>>> I would like to start a discussion on FLIP-377: support
> > >> configuration
> > >>>>>>> to
> > >>>>>>>>> disable filter pushdown for Table/SQL Sources[1].
> > >>>>>>>>>>
> > >>>>>>>>>> Currently, Flink Table/SQL does not expose fine-grained
> control
> > >> for
> > >>>>>>>>> users to enable or disable filter pushdown.
> > >>>>>>>>>> However, filter pushdown has some side effects, such as
> additional
> > >>>>>>>>> computational pressure on external systems.
> > >>>>>>>>>> Moreover, Improper queries can lead to issues such as full
> table
> > >>>>> scans,
> > >>>>>>>>> which in turn can impact the stability of external systems.
> > >>>>>>>>>>
> > >>>>>>>>>> Suppose we have an SQL query with two sources: Kafka and a
> > >> database.
> > >>>>>>>>>> The database is sensitive to pressure, and we want to
> configure
> > >> it to
> > >>>>>>>>> not perform filter pushdown to the database source.
> > >>>>>>>>>> However, we still want to perform filter pushdown to the Kafka
> > >> source
> > >>>>>>> to
> > >>>>>>>>> decrease network IO.
> > >>>>>>>>>>
> > >>>>>>>>>> I propose to support configuration to disable filter push
> down for
> > >>>>>>>>> Table/SQL sources to let user decide whether to perform filter
> > >>>>> pushdown.
> > >>>>>>>>>>
> > >>>>>>>>>> Looking forward to your feedback.
> > >>>>>>>>>>
> > >>>>>>>>>> [1]
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768
> > >>>>>>>>>>
> > >>>>>>>>>> Best,
> > >>>>>>>>>> Jiabao
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>>
> > >>> Best,
> > >>> Benchao Li
> > >>
> > >>
> >
>
>
> --
>
> Best,
> Benchao Li
>

Reply via email to