Hi, all,

Thanks for the lively discussion.

I agree with Jiabao. I think enabling "scan.filter-push-down.enabled"
relies on enabling "table.optimizer.source.predicate-pushdown-enabled".
It is a little strange that the planner still needs to push down the
filters when we set "scan.filter-push-down.enabled=false" and
"table.optimizer.source.predicate-pushdown-enabled=true".
Maybe we need to add some checks to warn the users when setting
"scan.filter-push-down.enabled=true" and
"table.optimizer.source.predicate-pushdown-enabled=false".

Besides that, I am +1 for renaming 'scan.filter-push-down.enabled' to
'source.predicate-pushdown.enabled'.

Best,
Hang

Jiabao Sun <jiabao....@xtransfer.cn.invalid> 于2023年10月25日周三 18:23写道:

> Thanks Benchao for the feedback.
>
> I understand that the configuration of global parallelism and task
> parallelism is at different granularities but with the same configuration.
> However, "table.optimizer.source.predicate-pushdown-enabled" and
> "scan.filter-push-down.enabled" are configurations for different
> components(optimizer and source operator).
>
> From a user's perspective, there are two scenarios:
>
> 1. Disabling all filter pushdown
> In this case, setting "table.optimizer.source.predicate-pushdown-enabled"
> to false is sufficient to meet the requirement.
>
> 2. Disabling filter pushdown for specific sources
> In this scenario, there is no need to adjust the value of
> "table.optimizer.source.predicate-pushdown-enabled".
> Instead, the focus should be on the configuration of
> "scan.filter-push-down.enabled" to meet the requirement.
> In this case, users do not need to set
> "table.optimizer.source.predicate-pushdown-enabled" to false and manually
> enable filter pushdown for specific sources.
>
> Additionally, if "scan.filter-push-down.enabled" doesn't respect to
> "table.optimizer.source.predicate-pushdown-enabled" and the default value
> of "scan.filter-push-down.enabled" is defined as true,
> it means that just modifying
> "table.optimizer.source.predicate-pushdown-enabled" as false will have no
> effect, and filter pushdown will still be performed.
>
> If we define the default value of "scan.filter-push-down.enabled" as
> false, it would introduce a difference in behavior compared to the previous
> version.
> The same SQL query that could successfully push down filters in the old
> version but would no longer do so after the upgrade.
>
> Best,
> Jiabao
>
>
> > 2023年10月25日 17:10,Benchao Li <libenc...@apache.org> 写道:
> >
> > Thanks Jiabao for the detailed explanations, that helps a lot, I
> > understand your rationale now.
> >
> > Correct me if I'm wrong. Your perspective is from "developer", which
> > means there is an optimizer and connector component, and if we want to
> > enable this feature (pushing filters down into connectors), you must
> > enable it firstly in optimizer, and only then connector has the chance
> > to decide to use it or not.
> >
> > My perspective is from "user" that (Why a user should care about the
> > difference of optimizer/connector) , this is a feature, and has two
> > way to control it, one way is to config it job-level, the other one is
> > in table properties. What a user expects is that they can control a
> > feature in a tiered way, that setting it per job, and then
> > fine-grained tune it per table.
> >
> > This is some kind of similar to other concepts, such as parallelism,
> > users can set a job level default parallelism, and then fine-grained
> > tune it per operator. There may be more such debate in the future
> > e.g., we can have a job level config about adding key-by before lookup
> > join, and also a hint/table property way to fine-grained control it
> > per lookup operator. Hence we'd better find a unified way for all
> > those similar kind of features.
> >
> > Jiabao Sun <jiabao....@xtransfer.cn.invalid> 于2023年10月25日周三 15:27写道:
> >>
> >> Thanks Jane for further explanation.
> >>
> >> These two configurations correspond to different levels.
> "scan.filter-push-down.enabled" does not make
> "table.optimizer.source.predicate" invalid.
> >> The planner will still push down predicates to all sources.
> >> Whether filter pushdown is allowed or not is determined by the specific
> source's "scan.filter-push-down.enabled" configuration.
> >>
> >> However, "table.optimizer.source.predicate" does directly affect
> "scan.filter-push-down.enabled”.
> >> When the planner disables predicate pushdown, the source-level filter
> pushdown will also not be executed, even if the source allows filter
> pushdown.
> >>
> >> Whatever, in point 1 and 2, our expectation is consistent.
> >> For the 3rd point, I still think that the planner-level configuration
> takes precedence over the source-level configuration.
> >> It may seem counterintuitive when we globally disable predicate
> pushdown but allow filter pushdown at the source level.
> >>
> >> Best,
> >> Jiabao
> >>
> >>
> >>
> >>> 2023年10月25日 14:35,Jane Chan <qingyue....@gmail.com> 写道:
> >>>
> >>> Hi Jiabao,
> >>>
> >>> Thanks for clarifying this. While by "scan.filter-push-down.enabled
> takes a
> >>> higher priority" I meant that this value should be respected whenever
> it is
> >>> set explicitly.
> >>>
> >>> The conclusion that
> >>>
> >>> 2. "table.optimizer.source.predicate" = "true" and
> >>>> "scan.filter-push-down.enabled" = "false"
> >>>> Allow the planner to perform predicate pushdown, but individual
> sources do
> >>>> not enable filter pushdown.
> >>>>
> >>>
> >>> This indicates that the option "scan.filter-push-down.enabled = false"
> for
> >>> an individual source connector does indeed override the global-level
> >>> planner settings to make a difference. And thus "has a higher
> priority".
> >>>
> >>> While for
> >>>
> >>> 3. "table.optimizer.source.predicate" = "false"
> >>>> Predicate pushdown is not allowed for the planner.
> >>>> Regardless of the value of the "scan.filter-push-down.enabled"
> >>>> configuration, filter pushdown is disabled.
> >>>> In this scenario, the behavior remains consistent with the old
> version as
> >>>> well.
> >>>>
> >>>
> >>> I still think "scan.filter-push-down.enabled" should also be respected
> if
> >>> it is enabled for individual connectors. WDYT?
> >>>
> >>> Best,
> >>> Jane
> >>>
> >>> On Wed, Oct 25, 2023 at 1:27 PM Jiabao Sun <jiabao....@xtransfer.cn
> .invalid>
> >>> wrote:
> >>>
> >>>> Thanks Benchao for the feedback.
> >>>>
> >>>> For the current proposal, we recommend keeping the default value of
> >>>> "table.optimizer.source.predicate" as true,
> >>>> and setting the the default value of newly introduced option
> >>>> "scan.filter-push-down.enabled" to true as well.
> >>>>
> >>>> The main purpose of doing this is to maintain consistency with
> previous
> >>>> versions, as whether to perform
> >>>> filter pushdown in the old version solely depends on the
> >>>> "table.optimizer.source.predicate" option.
> >>>> That means by default, as long as a TableSource implements the
> >>>> SupportsFilterPushDown interface, filter pushdown is allowed.
> >>>> And it seems that we don't have much benefit in changing the default
> value
> >>>> of "table.optimizer.source.predicate" to false.
> >>>>
> >>>> Regarding the priority of these two configurations, I believe that
> >>>> "table.optimizer.source.predicate"
> >>>> takes precedence over "scan.filter-push-down.enabled" and it exhibits
> the
> >>>> following behavior.
> >>>>
> >>>> 1. "table.optimizer.source.predicate" = "true" and
> >>>> "scan.filter-push-down.enabled" = "true"
> >>>> This is the default behavior, allowing filter pushdown for sources.
> >>>>
> >>>> 2. "table.optimizer.source.predicate" = "true" and
> >>>> "scan.filter-push-down.enabled" = "false"
> >>>> Allow the planner to perform predicate pushdown, but individual
> sources do
> >>>> not enable filter pushdown.
> >>>>
> >>>> 3. "table.optimizer.source.predicate" = "false"
> >>>> Predicate pushdown is not allowed for the planner.
> >>>> Regardless of the value of the "scan.filter-push-down.enabled"
> >>>> configuration, filter pushdown is disabled.
> >>>> In this scenario, the behavior remains consistent with the old
> version as
> >>>> well.
> >>>>
> >>>>
> >>>> From an implementation perspective, setting the priority of
> >>>> "scan.filter-push-down.enabled" higher than
> >>>> "table.optimizer.source.predicate" is difficult to achieve now.
> >>>> Because the PushFilterIntoSourceScanRuleBase at the planner level
> takes
> >>>> precedence over the source-level FilterPushDownSpec.
> >>>> Only when the PushFilterIntoSourceScanRuleBase is enabled, will the
> >>>> Source-level filter pushdown be performed.
> >>>>
> >>>> Additionally, in my opinion, there doesn't seem to be much benefit in
> >>>> setting a higher priority for "scan.filter-push-down.enabled".
> >>>> It may instead affect compatibility and increase implementation
> complexity.
> >>>>
> >>>> WDYT?
> >>>>
> >>>> Best,
> >>>> Jiabao
> >>>>
> >>>>
> >>>>> 2023年10月25日 11:56,Benchao Li <libenc...@apache.org> 写道:
> >>>>>
> >>>>> I agree with Jane that fine-grained configurations should have higher
> >>>>> priority than job level configurations.
> >>>>>
> >>>>> For current proposal, we can achieve that:
> >>>>> - Set "table.optimizer.source.predicate" = "true" to enable by
> >>>>> default, and set ""scan.filter-push-down.enabled" = "false" to
> disable
> >>>>> it per table source
> >>>>> - Set "table.optimizer.source.predicate" = "false" to disable by
> >>>>> default, and set ""scan.filter-push-down.enabled" = "true" to enable
> >>>>> it per table source
> >>>>>
> >>>>> Jane Chan <qingyue....@gmail.com> 于2023年10月24日周二 23:55写道:
> >>>>>>
> >>>>>>>
> >>>>>>> I believe that the configuration "table.optimizer.source.predicate"
> >>>> has a
> >>>>>>> higher priority at the planner level than the configuration at the
> >>>> source
> >>>>>>> level,
> >>>>>>> and it seems easy to implement now.
> >>>>>>>
> >>>>>>
> >>>>>> Correct me if I'm wrong, but I think the fine-grained configuration
> >>>>>> "scan.filter-push-down.enabled" should have a higher priority
> because
> >>>> the
> >>>>>> default value of "table.optimizer.source.predicate" is true. As a
> >>>> result,
> >>>>>> turning off filter push-down for a specific source will not take
> effect
> >>>>>> unless the default value of "table.optimizer.source.predicate" is
> >>>> changed
> >>>>>> to false, or, alternatively, let users manually set
> >>>>>> "table.optimizer.source.predicate" to false first and then
> selectively
> >>>>>> enable filter push-down for the desired sources, which is less
> >>>> intuitive.
> >>>>>> WDYT?
> >>>>>>
> >>>>>> Best,
> >>>>>> Jane
> >>>>>>
> >>>>>> On Tue, Oct 24, 2023 at 6:05 PM Jiabao Sun <jiabao....@xtransfer.cn
> >>>> .invalid>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Thanks Jane,
> >>>>>>>
> >>>>>>> I believe that the configuration "table.optimizer.source.predicate"
> >>>> has a
> >>>>>>> higher priority at the planner level than the configuration at the
> >>>> source
> >>>>>>> level,
> >>>>>>> and it seems easy to implement now.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Jiabao
> >>>>>>>
> >>>>>>>
> >>>>>>>> 2023年10月24日 17:36,Jane Chan <qingyue....@gmail.com> 写道:
> >>>>>>>>
> >>>>>>>> Hi Jiabao,
> >>>>>>>>
> >>>>>>>> Thanks for driving this discussion. I have a small question that
> will
> >>>>>>>> "scan.filter-push-down.enabled" take precedence over
> >>>>>>>> "table.optimizer.source.predicate" when the two parameters might
> >>>> conflict
> >>>>>>>> each other?
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Jane
> >>>>>>>>
> >>>>>>>> On Tue, Oct 24, 2023 at 5:05 PM Jiabao Sun <
> jiabao....@xtransfer.cn
> >>>>>>> .invalid>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Thanks Jark,
> >>>>>>>>>
> >>>>>>>>> If we only add configuration without adding the
> enableFilterPushDown
> >>>>>>>>> method in the SupportsFilterPushDown interface,
> >>>>>>>>> each connector would have to handle the same logic in the
> >>>> applyFilters
> >>>>>>>>> method to determine whether filter pushdown is needed.
> >>>>>>>>> This would increase complexity and violate the original behavior
> of
> >>>> the
> >>>>>>>>> applyFilters method.
> >>>>>>>>>
> >>>>>>>>> On the contrary, we only need to pass the configuration
> parameter in
> >>>> the
> >>>>>>>>> newly added enableFilterPushDown method
> >>>>>>>>> to decide whether to perform predicate pushdown.
> >>>>>>>>>
> >>>>>>>>> I think this approach would be clearer and simpler.
> >>>>>>>>> WDYT?
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Jiabao
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> 2023年10月24日 16:58,Jark Wu <imj...@gmail.com> 写道:
> >>>>>>>>>>
> >>>>>>>>>> Hi JIabao,
> >>>>>>>>>>
> >>>>>>>>>> I think the current interface can already satisfy your
> requirements.
> >>>>>>>>>> The connector can reject all the filters by returning the input
> >>>> filters
> >>>>>>>>>> as `Result#remainingFilters`.
> >>>>>>>>>>
> >>>>>>>>>> So maybe we don't need to introduce a new method to disable
> >>>>>>>>>> pushdown, but just introduce an option for the specific
> connector.
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Jark
> >>>>>>>>>>
> >>>>>>>>>> On Tue, 24 Oct 2023 at 16:38, Leonard Xu <xbjt...@gmail.com>
> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Thanks @Jiabao for kicking off this discussion.
> >>>>>>>>>>>
> >>>>>>>>>>> Could you add a section to explain the difference between
> proposed
> >>>>>>>>>>> connector level config `scan.filter-push-down.enabled` and
> existing
> >>>>>>>>> query
> >>>>>>>>>>> level config
> `table.optimizer.source.predicate-pushdown-enabled` ?
> >>>>>>>>>>>
> >>>>>>>>>>> Best,
> >>>>>>>>>>> Leonard
> >>>>>>>>>>>
> >>>>>>>>>>>> 2023年10月24日 下午4:18,Jiabao Sun <jiabao....@xtransfer.cn
> .INVALID>
> >>>> 写道:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hi Devs,
> >>>>>>>>>>>>
> >>>>>>>>>>>> I would like to start a discussion on FLIP-377: support
> >>>> configuration
> >>>>>>>>> to
> >>>>>>>>>>> disable filter pushdown for Table/SQL Sources[1].
> >>>>>>>>>>>>
> >>>>>>>>>>>> Currently, Flink Table/SQL does not expose fine-grained
> control
> >>>> for
> >>>>>>>>>>> users to enable or disable filter pushdown.
> >>>>>>>>>>>> However, filter pushdown has some side effects, such as
> additional
> >>>>>>>>>>> computational pressure on external systems.
> >>>>>>>>>>>> Moreover, Improper queries can lead to issues such as full
> table
> >>>>>>> scans,
> >>>>>>>>>>> which in turn can impact the stability of external systems.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Suppose we have an SQL query with two sources: Kafka and a
> >>>> database.
> >>>>>>>>>>>> The database is sensitive to pressure, and we want to
> configure
> >>>> it to
> >>>>>>>>>>> not perform filter pushdown to the database source.
> >>>>>>>>>>>> However, we still want to perform filter pushdown to the Kafka
> >>>> source
> >>>>>>>>> to
> >>>>>>>>>>> decrease network IO.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I propose to support configuration to disable filter push
> down for
> >>>>>>>>>>> Table/SQL sources to let user decide whether to perform filter
> >>>>>>> pushdown.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Looking forward to your feedback.
> >>>>>>>>>>>>
> >>>>>>>>>>>> [1]
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best,
> >>>>>>>>>>>> Jiabao
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Best,
> >>>>> Benchao Li
> >>>>
> >>>>
> >>
> >
> >
> > --
> >
> > Best,
> > Benchao Li
>
>

Reply via email to