Hi Jiabao,

Thanks for clarifying this. While by "scan.filter-push-down.enabled takes a
higher priority" I meant that this value should be respected whenever it is
set explicitly.

The conclusion that

2. "table.optimizer.source.predicate" = "true" and
> "scan.filter-push-down.enabled" = "false"
> Allow the planner to perform predicate pushdown, but individual sources do
> not enable filter pushdown.
>

This indicates that the option "scan.filter-push-down.enabled = false" for
an individual source connector does indeed override the global-level
planner settings to make a difference. And thus "has a higher priority".

While for

3. "table.optimizer.source.predicate" = "false"
> Predicate pushdown is not allowed for the planner.
> Regardless of the value of the "scan.filter-push-down.enabled"
> configuration, filter pushdown is disabled.
> In this scenario, the behavior remains consistent with the old version as
> well.
>

I still think "scan.filter-push-down.enabled" should also be respected if
it is enabled for individual connectors. WDYT?

Best,
Jane

On Wed, Oct 25, 2023 at 1:27 PM Jiabao Sun <jiabao....@xtransfer.cn.invalid>
wrote:

> Thanks Benchao for the feedback.
>
> For the current proposal, we recommend keeping the default value of
> "table.optimizer.source.predicate" as true,
> and setting the the default value of newly introduced option
> "scan.filter-push-down.enabled" to true as well.
>
> The main purpose of doing this is to maintain consistency with previous
> versions, as whether to perform
> filter pushdown in the old version solely depends on the
> "table.optimizer.source.predicate" option.
> That means by default, as long as a TableSource implements the
> SupportsFilterPushDown interface, filter pushdown is allowed.
> And it seems that we don't have much benefit in changing the default value
> of "table.optimizer.source.predicate" to false.
>
> Regarding the priority of these two configurations, I believe that
> "table.optimizer.source.predicate"
> takes precedence over "scan.filter-push-down.enabled" and it exhibits the
> following behavior.
>
> 1. "table.optimizer.source.predicate" = "true" and
> "scan.filter-push-down.enabled" = "true"
> This is the default behavior, allowing filter pushdown for sources.
>
> 2. "table.optimizer.source.predicate" = "true" and
> "scan.filter-push-down.enabled" = "false"
> Allow the planner to perform predicate pushdown, but individual sources do
> not enable filter pushdown.
>
> 3. "table.optimizer.source.predicate" = "false"
> Predicate pushdown is not allowed for the planner.
> Regardless of the value of the "scan.filter-push-down.enabled"
> configuration, filter pushdown is disabled.
> In this scenario, the behavior remains consistent with the old version as
> well.
>
>
> From an implementation perspective, setting the priority of
> "scan.filter-push-down.enabled" higher than
> "table.optimizer.source.predicate" is difficult to achieve now.
> Because the PushFilterIntoSourceScanRuleBase at the planner level takes
> precedence over the source-level FilterPushDownSpec.
> Only when the PushFilterIntoSourceScanRuleBase is enabled, will the
> Source-level filter pushdown be performed.
>
> Additionally, in my opinion, there doesn't seem to be much benefit in
> setting a higher priority for "scan.filter-push-down.enabled".
> It may instead affect compatibility and increase implementation complexity.
>
> WDYT?
>
> Best,
> Jiabao
>
>
> > 2023年10月25日 11:56,Benchao Li <libenc...@apache.org> 写道:
> >
> > I agree with Jane that fine-grained configurations should have higher
> > priority than job level configurations.
> >
> > For current proposal, we can achieve that:
> > - Set "table.optimizer.source.predicate" = "true" to enable by
> > default, and set ""scan.filter-push-down.enabled" = "false" to disable
> > it per table source
> > - Set "table.optimizer.source.predicate" = "false" to disable by
> > default, and set ""scan.filter-push-down.enabled" = "true" to enable
> > it per table source
> >
> > Jane Chan <qingyue....@gmail.com> 于2023年10月24日周二 23:55写道:
> >>
> >>>
> >>> I believe that the configuration "table.optimizer.source.predicate"
> has a
> >>> higher priority at the planner level than the configuration at the
> source
> >>> level,
> >>> and it seems easy to implement now.
> >>>
> >>
> >> Correct me if I'm wrong, but I think the fine-grained configuration
> >> "scan.filter-push-down.enabled" should have a higher priority because
> the
> >> default value of "table.optimizer.source.predicate" is true. As a
> result,
> >> turning off filter push-down for a specific source will not take effect
> >> unless the default value of "table.optimizer.source.predicate" is
> changed
> >> to false, or, alternatively, let users manually set
> >> "table.optimizer.source.predicate" to false first and then selectively
> >> enable filter push-down for the desired sources, which is less
> intuitive.
> >> WDYT?
> >>
> >> Best,
> >> Jane
> >>
> >> On Tue, Oct 24, 2023 at 6:05 PM Jiabao Sun <jiabao....@xtransfer.cn
> .invalid>
> >> wrote:
> >>
> >>> Thanks Jane,
> >>>
> >>> I believe that the configuration "table.optimizer.source.predicate"
> has a
> >>> higher priority at the planner level than the configuration at the
> source
> >>> level,
> >>> and it seems easy to implement now.
> >>>
> >>> Best,
> >>> Jiabao
> >>>
> >>>
> >>>> 2023年10月24日 17:36,Jane Chan <qingyue....@gmail.com> 写道:
> >>>>
> >>>> Hi Jiabao,
> >>>>
> >>>> Thanks for driving this discussion. I have a small question that will
> >>>> "scan.filter-push-down.enabled" take precedence over
> >>>> "table.optimizer.source.predicate" when the two parameters might
> conflict
> >>>> each other?
> >>>>
> >>>> Best,
> >>>> Jane
> >>>>
> >>>> On Tue, Oct 24, 2023 at 5:05 PM Jiabao Sun <jiabao....@xtransfer.cn
> >>> .invalid>
> >>>> wrote:
> >>>>
> >>>>> Thanks Jark,
> >>>>>
> >>>>> If we only add configuration without adding the enableFilterPushDown
> >>>>> method in the SupportsFilterPushDown interface,
> >>>>> each connector would have to handle the same logic in the
> applyFilters
> >>>>> method to determine whether filter pushdown is needed.
> >>>>> This would increase complexity and violate the original behavior of
> the
> >>>>> applyFilters method.
> >>>>>
> >>>>> On the contrary, we only need to pass the configuration parameter in
> the
> >>>>> newly added enableFilterPushDown method
> >>>>> to decide whether to perform predicate pushdown.
> >>>>>
> >>>>> I think this approach would be clearer and simpler.
> >>>>> WDYT?
> >>>>>
> >>>>> Best,
> >>>>> Jiabao
> >>>>>
> >>>>>
> >>>>>> 2023年10月24日 16:58,Jark Wu <imj...@gmail.com> 写道:
> >>>>>>
> >>>>>> Hi JIabao,
> >>>>>>
> >>>>>> I think the current interface can already satisfy your requirements.
> >>>>>> The connector can reject all the filters by returning the input
> filters
> >>>>>> as `Result#remainingFilters`.
> >>>>>>
> >>>>>> So maybe we don't need to introduce a new method to disable
> >>>>>> pushdown, but just introduce an option for the specific connector.
> >>>>>>
> >>>>>> Best,
> >>>>>> Jark
> >>>>>>
> >>>>>> On Tue, 24 Oct 2023 at 16:38, Leonard Xu <xbjt...@gmail.com> wrote:
> >>>>>>
> >>>>>>> Thanks @Jiabao for kicking off this discussion.
> >>>>>>>
> >>>>>>> Could you add a section to explain the difference between proposed
> >>>>>>> connector level config `scan.filter-push-down.enabled` and existing
> >>>>> query
> >>>>>>> level config `table.optimizer.source.predicate-pushdown-enabled` ?
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Leonard
> >>>>>>>
> >>>>>>>> 2023年10月24日 下午4:18,Jiabao Sun <jiabao....@xtransfer.cn.INVALID>
> 写道:
> >>>>>>>>
> >>>>>>>> Hi Devs,
> >>>>>>>>
> >>>>>>>> I would like to start a discussion on FLIP-377: support
> configuration
> >>>>> to
> >>>>>>> disable filter pushdown for Table/SQL Sources[1].
> >>>>>>>>
> >>>>>>>> Currently, Flink Table/SQL does not expose fine-grained control
> for
> >>>>>>> users to enable or disable filter pushdown.
> >>>>>>>> However, filter pushdown has some side effects, such as additional
> >>>>>>> computational pressure on external systems.
> >>>>>>>> Moreover, Improper queries can lead to issues such as full table
> >>> scans,
> >>>>>>> which in turn can impact the stability of external systems.
> >>>>>>>>
> >>>>>>>> Suppose we have an SQL query with two sources: Kafka and a
> database.
> >>>>>>>> The database is sensitive to pressure, and we want to configure
> it to
> >>>>>>> not perform filter pushdown to the database source.
> >>>>>>>> However, we still want to perform filter pushdown to the Kafka
> source
> >>>>> to
> >>>>>>> decrease network IO.
> >>>>>>>>
> >>>>>>>> I propose to support configuration to disable filter push down for
> >>>>>>> Table/SQL sources to let user decide whether to perform filter
> >>> pushdown.
> >>>>>>>>
> >>>>>>>> Looking forward to your feedback.
> >>>>>>>>
> >>>>>>>> [1]
> >>>>>>>
> >>>>>
> >>>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Jiabao
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>
> >>>
> >
> >
> >
> > --
> >
> > Best,
> > Benchao Li
>
>

Reply via email to