Hi Becket,

Actually, for FileSystemSource, it is not always desired, only OCR file formats 
support filter pushdown.

We can disable predicate pushdown for FileSystemSource by setting 
'table.optimizer.source.predicate-pushdown-enabled' to false. 
I think we can also disable filter pushdown at a more granular level through 
fine-grained configuration.


Best,
Jiabao


> 2023年10月31日 16:50,Becket Qin <becket....@gmail.com> 写道:
> 
> Hi Jiabao,
> 
> Thanks for the explanation. Maybe it's easier to explain with an example.
> 
> Let's take FileSystemTableSource as an example. Currently it implements
> SupportsFilterPushDown interface. With your proposal, does it have to
> support `source.filter-push-down.enabled` as well? But this configuration
> does not quite make sense for the FileSystemTableSource because filter
> pushdown is always desired. However, because this configuration is a part
> of the SupportsFilterPushDown interface (which sounds confusing to begin
> with), the FileSystemTableSource can only do one of the following:
> 
> 1. Ignore the user configuration to always apply the pushed down filters -
> this is an apparent anti-pattern because a configuration should always do
> what it says.
> 2. Throw an exception telling users that this configuration is not
> applicable to the FileSystemTableSource.
> 3. Implement this configuration to push back the pushed down filters, even
> though this is never desired.
> 
> None of the above options looks awkward. I am curious what your solution is
> here?
> 
> Thanks,
> 
> Jiangjie (Becket) Qin
> 
> On Tue, Oct 31, 2023 at 3:11 PM Jiabao Sun <jiabao....@xtransfer.cn.invalid>
> wrote:
> 
>> Thanks Becket for the further explanation.
>> 
>> Perhaps I didn't explain it clearly.
>> 
>> 1. If a source does not implement the SupportsFilterPushDown interface,
>> the newly added configurations do not need to be added to either the
>> requiredOptions or optionalOptions.
>> Similar to LookupOptions, if a source does not implement
>> LookupTableSource, there is no need to add LookupOptions to either
>> requiredOptions or optionalOptions.
>> 
>> 2. "And these configs are specific to those sources, instead of common
>> configs."
>> The newly introduced configurations define standardized names and default
>> values.
>> They still belong to the configuration at the individual source level.
>> The purpose is to avoid scattered configuration items when different
>> sources implement the same logic.
>> Whether a source should accept these configurations is determined by the
>> source's Factory.
>> 
>> Best,
>> Jiabao
>> 
>> 
>>> 2023年10月31日 13:47,Becket Qin <becket....@gmail.com> 写道:
>>> 
>>> Hi Jiabao,
>>> 
>>> Please see the replies inline.
>>> 
>>> Introducing common configurations does not mean that all sources must
>>>> accept these configuration options.
>>>> The configuration options supported by a source are determined by the
>>>> requiredOptions and optionalOptions in the Factory interface.
>>> 
>>> This is not true. Both required and optional options are SUPPORTED. That
>>> means they are implemented and if one specifies an optional config it
>> will
>>> still take effect. An OptionalConfig is "Optional" because this
>>> configuration has a default value. Hence, it is OK that users do not
>>> specify their own value. In another word, it is "optional" for the end
>>> users to set the config, but the implementation and support for that
>> config
>>> is NOT optional. In case a source does not support a common config, an
>>> exception must be thrown when the config is provided by the end users.
>>> However, the config we are talking about in this FLIP is a common config
>>> optional to implement, meaning that sometimes the claimed behavior won't
>> be
>>> there even if users specify that config.
>>> 
>>> Similar to sources that do not implement the LookupTableSource interface,
>>>> sources that do not implement the SupportsFilterPushDown interface also
>> do
>>>> not need to accept newly introduced options.
>>> 
>>> First of all, filter pushdown is a behavior of the query optimizer, not
>> the
>>> behavior of Sources. The Sources tells the optimizer that it has the
>>> ability to accept pushed down filters by implementing the
>>> SupportsFilterPushDown interface. And this is the only contract between
>> the
>>> Source and Optimizer regarding whether filters should be pushed down. As
>>> long as a specific source implements this decorative interface, filter
>>> pushdown should always take place, i.e.
>>> *SupportsFilterPushDown.applyFilters()* will be called. There should be
>> no
>>> other config to disable that call. However, Sources can decide how to
>>> behave based on their own configurations after *applyFilters()* is
>> called.
>>> And these configs are specific to those sources, instead of common
>> configs.
>>> Please see the examples I mentioned in the previous email.
>>> 
>>> Thanks,
>>> 
>>> Jiangjie (Becket) Qin
>>> 
>>> On Tue, Oct 31, 2023 at 10:27 AM Jiabao Sun <jiabao....@xtransfer.cn
>> .invalid>
>>> wrote:
>>> 
>>>> Hi Becket,
>>>> 
>>>> Sorry, there was a typo in the second point. Let me correct it:
>>>> 
>>>> Introducing common configurations does not mean that all sources must
>>>> accept these configuration options.
>>>> The configuration options supported by a source are determined by the
>>>> requiredOptions and optionalOptions in the Factory interface.
>>>> 
>>>> Similar to sources that do not implement the LookupTableSource
>> interface,
>>>> sources that do not implement the SupportsFilterPushDown interface also
>> do
>>>> not need to accept newly introduced options.
>>>> 
>>>> Best,
>>>> Jiabao
>>>> 
>>>> 
>>>>> 2023年10月31日 10:13,Jiabao Sun <jiabao....@xtransfer.cn.INVALID> 写道:
>>>>> 
>>>>> Thanks Becket for the feedback.
>>>>> 
>>>>> 1. Currently, the SupportsFilterPushDown#applyFilters method returns a
>>>> result that includes acceptedFilters and remainingFilters. The source
>> can
>>>> decide to push down some filters or not accept any of them.
>>>>> 2. Introducing common configuration options does not mean that a source
>>>> that supports the SupportsFilterPushDown capability must accept this
>>>> configuration. Similar to LookupOptions, only sources that implement the
>>>> LookupTableSource interface are necessary to accept these configuration
>>>> options.
>>>>> 
>>>>> Best,
>>>>> Jiabao
>>>>> 
>>>>> 
>>>>>> 2023年10月31日 07:49,Becket Qin <becket....@gmail.com> 写道:
>>>>>> 
>>>>>> Hi Jiabao and Ruanhang,
>>>>>> 
>>>>>> Adding a configuration of source.filter-push-down.enabled as a common
>>>>>> source configuration seems problematic.
>>>>>> 1. The config name is misleading. filter pushdown should only be
>>>> determined
>>>>>> by whether the SupportsFilterPushdown interface is implemented or not.
>>>>>> 2. The behavior of this configuration is only applicable to some
>> source
>>>>>> implementations. Why is it a common configuration?
>>>>>> 
>>>>>> Here's my suggestion for design principles:
>>>>>> 1. Only add source impl specific configuration to corresponding
>> sources.
>>>>>> 2. The configuration name should not overrule existing common
>> contracts.
>>>>>> 
>>>>>> For example, in the case of MySql source. There are several options:
>>>>>> 1. Have a configuration of `*mysql.avoid.remote.full.table.scan`*. If
>>>> this
>>>>>> configuration is set, and a filter pushdown does not hit an index, the
>>>>>> MySql source impl would not further pushdown the filter to MySql
>>>> servers.
>>>>>> Note that this assumes the MySql source can retrieve the index
>>>> information
>>>>>> from the MySql servers.
>>>>>> 2. If the MySql index information is not available to the MySql
>> source,
>>>> the
>>>>>> configuration could be something like
>>>> *`mysql.pushback.pushed.down.filters`*.
>>>>>> Once set to true, MySql source would just add all the filters to the
>>>>>> RemainingFilters in the Result returned by
>>>>>> *SupportsFilterPushdown.applyFilters().*
>>>>>> 3. An alternative to option 2 is to have a `
>>>>>> *mysql.apply.predicates.after.scan*`. When it is set to true, MySql
>>>> source
>>>>>> will not push the filter down to the MySql servers, but apply the
>>>> filters
>>>>>> inside the MySql source itself.
>>>>>> 
>>>>>> As you may see, the above configurations do not disable filter
>> pushdown
>>>>>> itself. They just allow various implementations of filter pushdown.
>> And
>>>> the
>>>>>> configuration name does not give any illusion that filter pushdown is
>>>>>> disabled.
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Jiangjie (Becket) Qin
>>>>>> 
>>>>>> On Mon, Oct 30, 2023 at 11:58 PM Jiabao Sun <jiabao....@xtransfer.cn
>>>> .invalid>
>>>>>> wrote:
>>>>>> 
>>>>>>> Thanks Hang for the suggestion.
>>>>>>> 
>>>>>>> 
>>>>>>> I think the configuration of TableSource is not closely related to
>>>>>>> SourceReader,
>>>>>>> so I prefer to introduce a independent configuration class
>>>>>>> TableSourceOptions in the flink-table-common module, similar to
>>>>>>> LookupOptions.
>>>>>>> 
>>>>>>> For the second point, I suggest adding Java doc to the
>>>> SupportsXXXPushDown
>>>>>>> interfaces, providing detailed information on these options that
>> needs
>>>> to
>>>>>>> be supported.
>>>>>>> 
>>>>>>> I have made updates in the FLIP document.
>>>>>>> Please help check it again.
>>>>>>> 
>>>>>>> 
>>>>>>> Best,
>>>>>>> Jiabao
>>>>>>> 
>>>>>>> 
>>>>>>>> 2023年10月30日 17:23,Hang Ruan <ruanhang1...@gmail.com> 写道:
>>>>>>>> 
>>>>>>>> Thanks for the improvements, Jiabao.
>>>>>>>> 
>>>>>>>> There are some details that I am not sure about.
>>>>>>>> 1. The new option `source.filter-push-down.enabled` will be added to
>>>>>>> which
>>>>>>>> class? I think it should be `SourceReaderOptions`.
>>>>>>>> 2. How are the connector developers able to know and follow the
>> FLIP?
>>>> Do
>>>>>>> we
>>>>>>>> need an abstract base class or provide a default method?
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Hang
>>>>>>>> 
>>>>>>>> Jiabao Sun <jiabao....@xtransfer.cn.invalid> 于2023年10月30日周一
>> 14:45写道:
>>>>>>>> 
>>>>>>>>> Hi, all,
>>>>>>>>> 
>>>>>>>>> Thanks for the lively discussion.
>>>>>>>>> 
>>>>>>>>> Based on the discussion, I have made some adjustments to the FLIP
>>>>>>> document:
>>>>>>>>> 
>>>>>>>>> 1. The name of the newly added option has been changed to
>>>>>>>>> "source.filter-push-down.enabled".
>>>>>>>>> 2. Considering compatibility with older versions, the newly added
>>>>>>>>> "source.filter-push-down.enabled" option needs to respect the
>>>>>>> optimizer's
>>>>>>>>> "table.optimizer.source.predicate-pushdown-enabled" option.
>>>>>>>>> But there is a consideration to remove the old option in Flink 2.0.
>>>>>>>>> 3. We can provide more options to disable other source abilities
>> with
>>>>>>> side
>>>>>>>>> effects, such as “source.aggregate.enabled” and
>>>>>>> “source.projection.enabled"
>>>>>>>>> This is not urgent and can be continuously introduced.
>>>>>>>>> 
>>>>>>>>> Looking forward to your feedback again.
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Jiabao
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 2023年10月29日 08:45,Becket Qin <becket....@gmail.com> 写道:
>>>>>>>>>> 
>>>>>>>>>> Thanks for digging into the git history, Jark. I agree it makes
>>>> sense
>>>>>>> to
>>>>>>>>>> deprecate this API in 2.0.
>>>>>>>>>> 
>>>>>>>>>> Cheers,
>>>>>>>>>> 
>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>> 
>>>>>>>>>> On Fri, Oct 27, 2023 at 5:47 PM Jark Wu <imj...@gmail.com> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi Becket,
>>>>>>>>>>> 
>>>>>>>>>>> I checked the history of "
>>>>>>>>>>> *table.optimizer.source.predicate-pushdown-enabled*",
>>>>>>>>>>> it seems it was introduced since the legacy FilterableTableSource
>>>>>>>>>>> interface
>>>>>>>>>>> which might be an experiential feature at that time. I don't see
>>>> the
>>>>>>>>>>> necessity
>>>>>>>>>>> of this option at the moment. Maybe we can deprecate this option
>>>> and
>>>>>>>>> drop
>>>>>>>>>>> it
>>>>>>>>>>> in Flink 2.0[1] if it is not necessary anymore. This may help to
>>>>>>>>>>> simplify this discussion.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Jark
>>>>>>>>>>> 
>>>>>>>>>>> [1]: https://issues.apache.org/jira/browse/FLINK-32383
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Thu, 26 Oct 2023 at 10:14, Becket Qin <becket....@gmail.com>
>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Thanks for the proposal, Jiabao. My two cents below:
>>>>>>>>>>>> 
>>>>>>>>>>>> 1. If I understand correctly, the motivation of the FLIP is
>>>> mainly to
>>>>>>>>>>>> make predicate pushdown optional on SOME of the Sources. If so,
>>>>>>>>> intuitively
>>>>>>>>>>>> the configuration should be Source specific instead of general.
>>>>>>>>> Otherwise,
>>>>>>>>>>>> we will end up with general configurations that may not take
>>>> effect
>>>>>>> for
>>>>>>>>>>>> some of the Source implementations. This violates the basic rule
>>>> of a
>>>>>>>>>>>> configuration - it does what it says, regardless of the
>>>>>>> implementation.
>>>>>>>>>>>> While configuration standardization is usually a good thing, it
>>>>>>> should
>>>>>>>>> not
>>>>>>>>>>>> break the basic rules.
>>>>>>>>>>>> If we really want to have this general configuration, for the
>>>> sources
>>>>>>>>>>>> this configuration does not apply, they should throw an
>> exception
>>>> to
>>>>>>>>> make
>>>>>>>>>>>> it clear that this configuration is not supported. However, that
>>>>>>> seems
>>>>>>>>> ugly.
>>>>>>>>>>>> 
>>>>>>>>>>>> 2. I think the actual motivation of this FLIP is about "how a
>>>> source
>>>>>>>>>>>> should implement predicate pushdown efficiently", not "whether
>>>>>>>>> predicate
>>>>>>>>>>>> pushdown should be applied to the source." For example, if a
>>>> source
>>>>>>>>> wants
>>>>>>>>>>>> to avoid additional computing load in the external system, it
>> can
>>>>>>>>> always
>>>>>>>>>>>> read the entire record and apply the predicates by itself.
>>>> However,
>>>>>>>>> from
>>>>>>>>>>>> the Flink perspective, the predicate pushdown is applied, it is
>>>> just
>>>>>>>>>>>> implemented differently by the source. So the design principle
>>>> here
>>>>>>> is
>>>>>>>>> that
>>>>>>>>>>>> Flink only cares about whether a source supports predicate
>>>> pushdown
>>>>>>> or
>>>>>>>>> not,
>>>>>>>>>>>> it does not care about the implementation efficiency / side
>>>> effect of
>>>>>>>>> the
>>>>>>>>>>>> predicates pushdown. It is the Source implementation's
>>>> responsibility
>>>>>>>>> to
>>>>>>>>>>>> ensure the predicates pushdown is implemented efficiently and
>> does
>>>>>>> not
>>>>>>>>>>>> impose excessive pressure on the external system. And it is OK
>> to
>>>>>>> have
>>>>>>>>>>>> additional configurations to achieve this goal. Obviously, such
>>>>>>>>>>>> configurations will be source specific in this case.
>>>>>>>>>>>> 
>>>>>>>>>>>> 3. Regarding the existing configurations of
>>>>>>>>> *table.optimizer.source.predicate-pushdown-enabled.
>>>>>>>>>>>> *I am not sure why we need it. Supposedly, if a source
>> implements
>>>> a
>>>>>>>>>>>> SupportsXXXPushDown interface, the optimizer should push the
>>>>>>>>> corresponding
>>>>>>>>>>>> predicates to the Source. I am not sure in which case this
>>>>>>>>> configuration
>>>>>>>>>>>> would be used. Any ideas @Jark Wu <imj...@gmail.com>?
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> 
>>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Oct 25, 2023 at 11:55 PM Jiabao Sun
>>>>>>>>>>>> <jiabao....@xtransfer.cn.invalid> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks Jane for the detailed explanation.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I think that for users, we should respect conventions over
>>>>>>>>>>>>> configurations.
>>>>>>>>>>>>> Conventions can be default values explicitly specified in
>>>>>>>>>>>>> configurations, or they can be behaviors that follow previous
>>>>>>>>> versions.
>>>>>>>>>>>>> If the same code has different behaviors in different versions,
>>>> it
>>>>>>>>> would
>>>>>>>>>>>>> be a very bad thing.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I agree that for regular users, it is not necessary to
>> understand
>>>>>>> all
>>>>>>>>>>>>> the configurations related to Flink.
>>>>>>>>>>>>> By following conventions, they can have a good experience.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Let's get back to the practical situation and consider it.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Case 1:
>>>>>>>>>>>>> The user is not familiar with the purpose of the
>>>>>>>>>>>>> table.optimizer.source.predicate-pushdown-enabled configuration
>>>> but
>>>>>>>>> follows
>>>>>>>>>>>>> the convention of allowing predicate pushdown to the source by
>>>>>>>>> default.
>>>>>>>>>>>>> Just understanding the source.predicate-pushdown-enabled
>>>>>>> configuration
>>>>>>>>>>>>> and performing fine-grained toggle control will work well.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Case 2:
>>>>>>>>>>>>> The user understands the meaning of the
>>>>>>>>>>>>> table.optimizer.source.predicate-pushdown-enabled configuration
>>>> and
>>>>>>>>> has set
>>>>>>>>>>>>> its value to false.
>>>>>>>>>>>>> We have reason to believe that the user understands the meaning
>>>> of
>>>>>>> the
>>>>>>>>>>>>> predicate pushdown configuration and the intention is to
>> disable
>>>>>>>>> predicate
>>>>>>>>>>>>> pushdown (rather than whether or not to allow it).
>>>>>>>>>>>>> The previous choice of globally disabling it is likely because
>> it
>>>>>>>>>>>>> couldn't be disabled on individual sources.
>>>>>>>>>>>>> From this perspective, if we provide more fine-grained
>>>> configuration
>>>>>>>>>>>>> support and provide detailed explanations of the configuration
>>>>>>>>> behaviors in
>>>>>>>>>>>>> the documentation,
>>>>>>>>>>>>> users can clearly understand the differences between these two
>>>>>>>>>>>>> configurations and use them correctly.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Also, I don't agree that
>>>>>>>>>>>>> table.optimizer.source.predicate-pushdown-enabled = true and
>>>>>>>>>>>>> source.predicate-pushdown-enabled = false means that the local
>>>>>>>>>>>>> configuration overrides the global configuration.
>>>>>>>>>>>>> On the contrary, both configurations are functioning correctly.
>>>>>>>>>>>>> The optimizer allows predicate pushdown to all sources, but
>> some
>>>>>>>>> sources
>>>>>>>>>>>>> can reject the filters pushed down by the optimizer.
>>>>>>>>>>>>> This is natural, just like different components at different
>>>> levels
>>>>>>>>> are
>>>>>>>>>>>>> responsible for different tasks.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The more serious issue is that if
>>>>>>> "source.predicate-pushdown-enabled"
>>>>>>>>>>>>> does not respect
>>>>>>> "table.optimizer.source.predicate-pushdown-enabled”,
>>>>>>>>>>>>> the "table.optimizer.source.predicate-pushdown-enabled"
>>>>>>> configuration
>>>>>>>>>>>>> will be invalidated.
>>>>>>>>>>>>> This means that regardless of whether
>>>>>>>>>>>>> "table.optimizer.source.predicate-pushdown-enabled" is set to
>>>> true
>>>>>>> or
>>>>>>>>>>>>> false, it will have no effect.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Jiabao
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 2023年10月25日 22:24,Jane Chan <qingyue....@gmail.com> 写道:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi Jiabao,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks for the in-depth clarification. Here are my cents
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> However, "table.optimizer.source.predicate-pushdown-enabled"
>> and
>>>>>>>>>>>>>>> "scan.filter-push-down.enabled" are configurations for
>>>> different
>>>>>>>>>>>>>>> components(optimizer and source operator).
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> We cannot assume that every user would be interested in
>>>>>>> understanding
>>>>>>>>>>>>> the
>>>>>>>>>>>>>> internal components of Flink, such as the optimizer or
>>>> connectors,
>>>>>>>>> and
>>>>>>>>>>>>> the
>>>>>>>>>>>>>> specific configurations associated with each component.
>> Instead,
>>>>>>>>> users
>>>>>>>>>>>>>> might be more concerned about knowing which configuration
>>>> enables
>>>>>>> or
>>>>>>>>>>>>>> disables the filter push-down feature for all source
>> connectors,
>>>>>>> and
>>>>>>>>>>>>> which
>>>>>>>>>>>>>> parameter provides the flexibility to override this behavior
>>>> for a
>>>>>>>>>>>>> single
>>>>>>>>>>>>>> source if needed.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> So, from this perspective, I am inclined to divide these two
>>>>>>>>> parameters
>>>>>>>>>>>>>> based on the scope of their impact from the user's perspective
>>>>>>> (i.e.
>>>>>>>>>>>>>> global-level or operator-level), rather than categorizing them
>>>>>>> based
>>>>>>>>>>>>> on the
>>>>>>>>>>>>>> component hierarchy from a developer's point of view.
>> Therefore,
>>>>>>>>> based
>>>>>>>>>>>>> on
>>>>>>>>>>>>>> this premise, it is intuitive and natural for users to
>>>>>>>>>>>>>> understand fine-grained configuration options can override
>>>> global
>>>>>>>>>>>>>> configurations.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Additionally, if "scan.filter-push-down.enabled" doesn't
>>>> respect to
>>>>>>>>>>>>>>> "table.optimizer.source.predicate-pushdown-enabled" and the
>>>>>>> default
>>>>>>>>>>>>> value
>>>>>>>>>>>>>>> of "scan.filter-push-down.enabled" is defined as true,
>>>>>>>>>>>>>>> it means that just modifying
>>>>>>>>>>>>>>> "table.optimizer.source.predicate-pushdown-enabled" as false
>>>> will
>>>>>>>>>>>>> have no
>>>>>>>>>>>>>>> effect, and filter pushdown will still be performed.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> If we define the default value of
>>>> "scan.filter-push-down.enabled"
>>>>>>> as
>>>>>>>>>>>>>>> false, it would introduce a difference in behavior compared
>> to
>>>> the
>>>>>>>>>>>>> previous
>>>>>>>>>>>>>>> version.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> <1>If I understand correctly, "scan.filter-push-down.enabled"
>>>> is a
>>>>>>>>>>>>>> connector option, which means the only way to configure it is
>> to
>>>>>>>>>>>>> explicitly
>>>>>>>>>>>>>> specify it in DDL (no matter whether disable or enable), and
>> the
>>>>>>> SET
>>>>>>>>>>>>>> command is not applicable, so I think it's natural to still
>>>> respect
>>>>>>>>>>>>> user's
>>>>>>>>>>>>>> specification here. Otherwise, users might be more confused
>>>> about
>>>>>>> why
>>>>>>>>>>>>> the
>>>>>>>>>>>>>> DDL does not work as expected, and the reason is just because
>>>> some
>>>>>>>>>>>>> other
>>>>>>>>>>>>>> "optimizer" configuration is set to a different value.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> <2> From the implementation side, I am inclined to keep the
>>>>>>>>> parameter's
>>>>>>>>>>>>>> priority consistent for all conditions.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Let "global" denote
>>>>>>>>>>>>> "table.optimizer.source.predicate-pushdown-enabled",
>>>>>>>>>>>>>> and let "per-source" denote "scan.filter-push-down.enabled"
>> for
>>>>>>>>>>>>> specific
>>>>>>>>>>>>>> source T,  the following Truth table (based on the current
>>>> design)
>>>>>>>>>>>>>> indicates the inconsistent behavior for "per-source override
>>>>>>> global".
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> .------------.---------------.-------------------
>>>>>>>>>>>>>> ----.-------------------------------------.
>>>>>>>>>>>>>> | global   | per-source | push-down for T | per-source
>> override
>>>>>>>>> global
>>>>>>>>>>>>> |
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>> 
>> :-----------+--------------+-----------------------+------------------------------------:
>>>>>>>>>>>>>> | true       | false         | false                    | Y
>>>>>>>>>>>>>>                  |
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>> 
>> :-----------+--------------+-----------------------+------------------------------------:
>>>>>>>>>>>>>> | false     | true           | false                    | N
>>>>>>>>>>>>>>                  |
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>> 
>> .------------.---------------.-----------------------.-------------------------------------.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Jane
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Wed, Oct 25, 2023 at 6:22 PM Jiabao Sun <
>>>>>>> jiabao....@xtransfer.cn
>>>>>>>>>>>>> .invalid>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks Benchao for the feedback.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I understand that the configuration of global parallelism and
>>>> task
>>>>>>>>>>>>>>> parallelism is at different granularities but with the same
>>>>>>>>>>>>> configuration.
>>>>>>>>>>>>>>> However, "table.optimizer.source.predicate-pushdown-enabled"
>>>> and
>>>>>>>>>>>>>>> "scan.filter-push-down.enabled" are configurations for
>>>> different
>>>>>>>>>>>>>>> components(optimizer and source operator).
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> From a user's perspective, there are two scenarios:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 1. Disabling all filter pushdown
>>>>>>>>>>>>>>> In this case, setting
>>>>>>>>>>>>> "table.optimizer.source.predicate-pushdown-enabled"
>>>>>>>>>>>>>>> to false is sufficient to meet the requirement.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 2. Disabling filter pushdown for specific sources
>>>>>>>>>>>>>>> In this scenario, there is no need to adjust the value of
>>>>>>>>>>>>>>> "table.optimizer.source.predicate-pushdown-enabled".
>>>>>>>>>>>>>>> Instead, the focus should be on the configuration of
>>>>>>>>>>>>>>> "scan.filter-push-down.enabled" to meet the requirement.
>>>>>>>>>>>>>>> In this case, users do not need to set
>>>>>>>>>>>>>>> "table.optimizer.source.predicate-pushdown-enabled" to false
>>>> and
>>>>>>>>>>>>> manually
>>>>>>>>>>>>>>> enable filter pushdown for specific sources.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Additionally, if "scan.filter-push-down.enabled" doesn't
>>>> respect
>>>>>>> to
>>>>>>>>>>>>>>> "table.optimizer.source.predicate-pushdown-enabled" and the
>>>>>>> default
>>>>>>>>>>>>> value
>>>>>>>>>>>>>>> of "scan.filter-push-down.enabled" is defined as true,
>>>>>>>>>>>>>>> it means that just modifying
>>>>>>>>>>>>>>> "table.optimizer.source.predicate-pushdown-enabled" as false
>>>> will
>>>>>>>>>>>>> have no
>>>>>>>>>>>>>>> effect, and filter pushdown will still be performed.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> If we define the default value of
>>>> "scan.filter-push-down.enabled"
>>>>>>> as
>>>>>>>>>>>>>>> false, it would introduce a difference in behavior compared
>> to
>>>> the
>>>>>>>>>>>>> previous
>>>>>>>>>>>>>>> version.
>>>>>>>>>>>>>>> The same SQL query that could successfully push down filters
>> in
>>>>>>> the
>>>>>>>>>>>>> old
>>>>>>>>>>>>>>> version but would no longer do so after the upgrade.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>> Jiabao
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 2023年10月25日 17:10,Benchao Li <libenc...@apache.org> 写道:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks Jiabao for the detailed explanations, that helps a
>>>> lot, I
>>>>>>>>>>>>>>>> understand your rationale now.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Correct me if I'm wrong. Your perspective is from
>> "developer",
>>>>>>>>> which
>>>>>>>>>>>>>>>> means there is an optimizer and connector component, and if
>> we
>>>>>>> want
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> enable this feature (pushing filters down into connectors),
>>>> you
>>>>>>>>> must
>>>>>>>>>>>>>>>> enable it firstly in optimizer, and only then connector has
>>>> the
>>>>>>>>>>>>> chance
>>>>>>>>>>>>>>>> to decide to use it or not.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> My perspective is from "user" that (Why a user should care
>>>> about
>>>>>>>>> the
>>>>>>>>>>>>>>>> difference of optimizer/connector) , this is a feature, and
>>>> has
>>>>>>> two
>>>>>>>>>>>>>>>> way to control it, one way is to config it job-level, the
>>>> other
>>>>>>> one
>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>> in table properties. What a user expects is that they can
>>>>>>> control a
>>>>>>>>>>>>>>>> feature in a tiered way, that setting it per job, and then
>>>>>>>>>>>>>>>> fine-grained tune it per table.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> This is some kind of similar to other concepts, such as
>>>>>>>>> parallelism,
>>>>>>>>>>>>>>>> users can set a job level default parallelism, and then
>>>>>>>>> fine-grained
>>>>>>>>>>>>>>>> tune it per operator. There may be more such debate in the
>>>> future
>>>>>>>>>>>>>>>> e.g., we can have a job level config about adding key-by
>>>> before
>>>>>>>>>>>>> lookup
>>>>>>>>>>>>>>>> join, and also a hint/table property way to fine-grained
>>>> control
>>>>>>> it
>>>>>>>>>>>>>>>> per lookup operator. Hence we'd better find a unified way
>> for
>>>> all
>>>>>>>>>>>>>>>> those similar kind of features.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Jiabao Sun <jiabao....@xtransfer.cn.invalid> 于2023年10月25日周三
>>>>>>>>> 15:27写道:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks Jane for further explanation.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> These two configurations correspond to different levels.
>>>>>>>>>>>>>>> "scan.filter-push-down.enabled" does not make
>>>>>>>>>>>>>>> "table.optimizer.source.predicate" invalid.
>>>>>>>>>>>>>>>>> The planner will still push down predicates to all sources.
>>>>>>>>>>>>>>>>> Whether filter pushdown is allowed or not is determined by
>>>> the
>>>>>>>>>>>>> specific
>>>>>>>>>>>>>>> source's "scan.filter-push-down.enabled" configuration.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> However, "table.optimizer.source.predicate" does directly
>>>> affect
>>>>>>>>>>>>>>> "scan.filter-push-down.enabled”.
>>>>>>>>>>>>>>>>> When the planner disables predicate pushdown, the
>>>> source-level
>>>>>>>>>>>>> filter
>>>>>>>>>>>>>>> pushdown will also not be executed, even if the source allows
>>>>>>> filter
>>>>>>>>>>>>>>> pushdown.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Whatever, in point 1 and 2, our expectation is consistent.
>>>>>>>>>>>>>>>>> For the 3rd point, I still think that the planner-level
>>>>>>>>>>>>> configuration
>>>>>>>>>>>>>>> takes precedence over the source-level configuration.
>>>>>>>>>>>>>>>>> It may seem counterintuitive when we globally disable
>>>> predicate
>>>>>>>>>>>>>>> pushdown but allow filter pushdown at the source level.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>> Jiabao
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 2023年10月25日 14:35,Jane Chan <qingyue....@gmail.com> 写道:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hi Jiabao,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks for clarifying this. While by
>>>>>>>>> "scan.filter-push-down.enabled
>>>>>>>>>>>>>>> takes a
>>>>>>>>>>>>>>>>>> higher priority" I meant that this value should be
>> respected
>>>>>>>>>>>>> whenever
>>>>>>>>>>>>>>> it is
>>>>>>>>>>>>>>>>>> set explicitly.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> The conclusion that
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 2. "table.optimizer.source.predicate" = "true" and
>>>>>>>>>>>>>>>>>>> "scan.filter-push-down.enabled" = "false"
>>>>>>>>>>>>>>>>>>> Allow the planner to perform predicate pushdown, but
>>>>>>> individual
>>>>>>>>>>>>>>> sources do
>>>>>>>>>>>>>>>>>>> not enable filter pushdown.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> This indicates that the option
>>>> "scan.filter-push-down.enabled =
>>>>>>>>>>>>> false"
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>> an individual source connector does indeed override the
>>>>>>>>>>>>> global-level
>>>>>>>>>>>>>>>>>> planner settings to make a difference. And thus "has a
>>>> higher
>>>>>>>>>>>>>>> priority".
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> While for
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 3. "table.optimizer.source.predicate" = "false"
>>>>>>>>>>>>>>>>>>> Predicate pushdown is not allowed for the planner.
>>>>>>>>>>>>>>>>>>> Regardless of the value of the
>>>> "scan.filter-push-down.enabled"
>>>>>>>>>>>>>>>>>>> configuration, filter pushdown is disabled.
>>>>>>>>>>>>>>>>>>> In this scenario, the behavior remains consistent with
>> the
>>>> old
>>>>>>>>>>>>>>> version as
>>>>>>>>>>>>>>>>>>> well.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I still think "scan.filter-push-down.enabled" should also
>> be
>>>>>>>>>>>>> respected
>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>>> it is enabled for individual connectors. WDYT?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>> Jane
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Wed, Oct 25, 2023 at 1:27 PM Jiabao Sun <
>>>>>>>>>>>>> jiabao....@xtransfer.cn
>>>>>>>>>>>>>>> .invalid>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thanks Benchao for the feedback.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> For the current proposal, we recommend keeping the
>> default
>>>>>>> value
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>> "table.optimizer.source.predicate" as true,
>>>>>>>>>>>>>>>>>>> and setting the the default value of newly introduced
>>>> option
>>>>>>>>>>>>>>>>>>> "scan.filter-push-down.enabled" to true as well.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> The main purpose of doing this is to maintain consistency
>>>> with
>>>>>>>>>>>>>>> previous
>>>>>>>>>>>>>>>>>>> versions, as whether to perform
>>>>>>>>>>>>>>>>>>> filter pushdown in the old version solely depends on the
>>>>>>>>>>>>>>>>>>> "table.optimizer.source.predicate" option.
>>>>>>>>>>>>>>>>>>> That means by default, as long as a TableSource
>> implements
>>>> the
>>>>>>>>>>>>>>>>>>> SupportsFilterPushDown interface, filter pushdown is
>>>> allowed.
>>>>>>>>>>>>>>>>>>> And it seems that we don't have much benefit in changing
>>>> the
>>>>>>>>>>>>> default
>>>>>>>>>>>>>>> value
>>>>>>>>>>>>>>>>>>> of "table.optimizer.source.predicate" to false.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Regarding the priority of these two configurations, I
>>>> believe
>>>>>>>>> that
>>>>>>>>>>>>>>>>>>> "table.optimizer.source.predicate"
>>>>>>>>>>>>>>>>>>> takes precedence over "scan.filter-push-down.enabled" and
>>>> it
>>>>>>>>>>>>> exhibits
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> following behavior.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 1. "table.optimizer.source.predicate" = "true" and
>>>>>>>>>>>>>>>>>>> "scan.filter-push-down.enabled" = "true"
>>>>>>>>>>>>>>>>>>> This is the default behavior, allowing filter pushdown
>> for
>>>>>>>>>>>>> sources.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 2. "table.optimizer.source.predicate" = "true" and
>>>>>>>>>>>>>>>>>>> "scan.filter-push-down.enabled" = "false"
>>>>>>>>>>>>>>>>>>> Allow the planner to perform predicate pushdown, but
>>>>>>> individual
>>>>>>>>>>>>>>> sources do
>>>>>>>>>>>>>>>>>>> not enable filter pushdown.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 3. "table.optimizer.source.predicate" = "false"
>>>>>>>>>>>>>>>>>>> Predicate pushdown is not allowed for the planner.
>>>>>>>>>>>>>>>>>>> Regardless of the value of the
>>>> "scan.filter-push-down.enabled"
>>>>>>>>>>>>>>>>>>> configuration, filter pushdown is disabled.
>>>>>>>>>>>>>>>>>>> In this scenario, the behavior remains consistent with
>> the
>>>> old
>>>>>>>>>>>>>>> version as
>>>>>>>>>>>>>>>>>>> well.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> From an implementation perspective, setting the priority
>> of
>>>>>>>>>>>>>>>>>>> "scan.filter-push-down.enabled" higher than
>>>>>>>>>>>>>>>>>>> "table.optimizer.source.predicate" is difficult to
>> achieve
>>>>>>> now.
>>>>>>>>>>>>>>>>>>> Because the PushFilterIntoSourceScanRuleBase at the
>> planner
>>>>>>>>> level
>>>>>>>>>>>>>>> takes
>>>>>>>>>>>>>>>>>>> precedence over the source-level FilterPushDownSpec.
>>>>>>>>>>>>>>>>>>> Only when the PushFilterIntoSourceScanRuleBase is
>> enabled,
>>>>>>> will
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> Source-level filter pushdown be performed.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Additionally, in my opinion, there doesn't seem to be
>> much
>>>>>>>>>>>>> benefit in
>>>>>>>>>>>>>>>>>>> setting a higher priority for
>>>> "scan.filter-push-down.enabled".
>>>>>>>>>>>>>>>>>>> It may instead affect compatibility and increase
>>>>>>> implementation
>>>>>>>>>>>>>>> complexity.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> WDYT?
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>> Jiabao
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 2023年10月25日 11:56,Benchao Li <libenc...@apache.org> 写道:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I agree with Jane that fine-grained configurations
>> should
>>>>>>> have
>>>>>>>>>>>>> higher
>>>>>>>>>>>>>>>>>>>> priority than job level configurations.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> For current proposal, we can achieve that:
>>>>>>>>>>>>>>>>>>>> - Set "table.optimizer.source.predicate" = "true" to
>>>> enable
>>>>>>> by
>>>>>>>>>>>>>>>>>>>> default, and set ""scan.filter-push-down.enabled" =
>>>> "false"
>>>>>>> to
>>>>>>>>>>>>>>> disable
>>>>>>>>>>>>>>>>>>>> it per table source
>>>>>>>>>>>>>>>>>>>> - Set "table.optimizer.source.predicate" = "false" to
>>>> disable
>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>> default, and set ""scan.filter-push-down.enabled" =
>>>> "true" to
>>>>>>>>>>>>> enable
>>>>>>>>>>>>>>>>>>>> it per table source
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Jane Chan <qingyue....@gmail.com> 于2023年10月24日周二
>> 23:55写道:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> I believe that the configuration
>>>>>>>>>>>>> "table.optimizer.source.predicate"
>>>>>>>>>>>>>>>>>>> has a
>>>>>>>>>>>>>>>>>>>>>> higher priority at the planner level than the
>>>> configuration
>>>>>>>>> at
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> source
>>>>>>>>>>>>>>>>>>>>>> level,
>>>>>>>>>>>>>>>>>>>>>> and it seems easy to implement now.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Correct me if I'm wrong, but I think the fine-grained
>>>>>>>>>>>>> configuration
>>>>>>>>>>>>>>>>>>>>> "scan.filter-push-down.enabled" should have a higher
>>>>>>> priority
>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> default value of "table.optimizer.source.predicate" is
>>>> true.
>>>>>>>>> As
>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>> result,
>>>>>>>>>>>>>>>>>>>>> turning off filter push-down for a specific source will
>>>> not
>>>>>>>>> take
>>>>>>>>>>>>>>> effect
>>>>>>>>>>>>>>>>>>>>> unless the default value of
>>>>>>> "table.optimizer.source.predicate"
>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>> changed
>>>>>>>>>>>>>>>>>>>>> to false, or, alternatively, let users manually set
>>>>>>>>>>>>>>>>>>>>> "table.optimizer.source.predicate" to false first and
>>>> then
>>>>>>>>>>>>>>> selectively
>>>>>>>>>>>>>>>>>>>>> enable filter push-down for the desired sources, which
>> is
>>>>>>> less
>>>>>>>>>>>>>>>>>>> intuitive.
>>>>>>>>>>>>>>>>>>>>> WDYT?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>> Jane
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 24, 2023 at 6:05 PM Jiabao Sun <
>>>>>>>>>>>>> jiabao....@xtransfer.cn
>>>>>>>>>>>>>>>>>>> .invalid>
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Thanks Jane,
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> I believe that the configuration
>>>>>>>>>>>>> "table.optimizer.source.predicate"
>>>>>>>>>>>>>>>>>>> has a
>>>>>>>>>>>>>>>>>>>>>> higher priority at the planner level than the
>>>> configuration
>>>>>>>>> at
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> source
>>>>>>>>>>>>>>>>>>>>>> level,
>>>>>>>>>>>>>>>>>>>>>> and it seems easy to implement now.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>> Jiabao
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 2023年10月24日 17:36,Jane Chan <qingyue....@gmail.com>
>>>> 写道:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Hi Jiabao,
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Thanks for driving this discussion. I have a small
>>>>>>> question
>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>>>>> "scan.filter-push-down.enabled" take precedence over
>>>>>>>>>>>>>>>>>>>>>>> "table.optimizer.source.predicate" when the two
>>>> parameters
>>>>>>>>>>>>> might
>>>>>>>>>>>>>>>>>>> conflict
>>>>>>>>>>>>>>>>>>>>>>> each other?
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>> Jane
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 24, 2023 at 5:05 PM Jiabao Sun <
>>>>>>>>>>>>>>> jiabao....@xtransfer.cn
>>>>>>>>>>>>>>>>>>>>>> .invalid>
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Thanks Jark,
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> If we only add configuration without adding the
>>>>>>>>>>>>>>> enableFilterPushDown
>>>>>>>>>>>>>>>>>>>>>>>> method in the SupportsFilterPushDown interface,
>>>>>>>>>>>>>>>>>>>>>>>> each connector would have to handle the same logic
>> in
>>>> the
>>>>>>>>>>>>>>>>>>> applyFilters
>>>>>>>>>>>>>>>>>>>>>>>> method to determine whether filter pushdown is
>> needed.
>>>>>>>>>>>>>>>>>>>>>>>> This would increase complexity and violate the
>>>> original
>>>>>>>>>>>>> behavior
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> applyFilters method.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> On the contrary, we only need to pass the
>>>> configuration
>>>>>>>>>>>>>>> parameter in
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> newly added enableFilterPushDown method
>>>>>>>>>>>>>>>>>>>>>>>> to decide whether to perform predicate pushdown.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> I think this approach would be clearer and simpler.
>>>>>>>>>>>>>>>>>>>>>>>> WDYT?
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>> Jiabao
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 2023年10月24日 16:58,Jark Wu <imj...@gmail.com> 写道:
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Hi JIabao,
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> I think the current interface can already satisfy
>>>> your
>>>>>>>>>>>>>>> requirements.
>>>>>>>>>>>>>>>>>>>>>>>>> The connector can reject all the filters by
>> returning
>>>>>>> the
>>>>>>>>>>>>> input
>>>>>>>>>>>>>>>>>>> filters
>>>>>>>>>>>>>>>>>>>>>>>>> as `Result#remainingFilters`.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> So maybe we don't need to introduce a new method to
>>>>>>>>> disable
>>>>>>>>>>>>>>>>>>>>>>>>> pushdown, but just introduce an option for the
>>>> specific
>>>>>>>>>>>>>>> connector.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>> Jark
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, 24 Oct 2023 at 16:38, Leonard Xu <
>>>>>>>>> xbjt...@gmail.com
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks @Jiabao for kicking off this discussion.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Could you add a section to explain the difference
>>>>>>> between
>>>>>>>>>>>>>>> proposed
>>>>>>>>>>>>>>>>>>>>>>>>>> connector level config
>>>> `scan.filter-push-down.enabled`
>>>>>>>>> and
>>>>>>>>>>>>>>> existing
>>>>>>>>>>>>>>>>>>>>>>>> query
>>>>>>>>>>>>>>>>>>>>>>>>>> level config
>>>>>>>>>>>>>>> `table.optimizer.source.predicate-pushdown-enabled` ?
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>>> Leonard
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 2023年10月24日 下午4:18,Jiabao Sun <
>>>>>>> jiabao....@xtransfer.cn
>>>>>>>>>>>>>>> .INVALID>
>>>>>>>>>>>>>>>>>>> 写道:
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Devs,
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> I would like to start a discussion on FLIP-377:
>>>>>>> support
>>>>>>>>>>>>>>>>>>> configuration
>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>> disable filter pushdown for Table/SQL Sources[1].
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Currently, Flink Table/SQL does not expose
>>>>>>> fine-grained
>>>>>>>>>>>>>>> control
>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>> users to enable or disable filter pushdown.
>>>>>>>>>>>>>>>>>>>>>>>>>>> However, filter pushdown has some side effects,
>>>> such
>>>>>>> as
>>>>>>>>>>>>>>> additional
>>>>>>>>>>>>>>>>>>>>>>>>>> computational pressure on external systems.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Moreover, Improper queries can lead to issues
>> such
>>>> as
>>>>>>>>> full
>>>>>>>>>>>>>>> table
>>>>>>>>>>>>>>>>>>>>>> scans,
>>>>>>>>>>>>>>>>>>>>>>>>>> which in turn can impact the stability of external
>>>>>>>>> systems.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Suppose we have an SQL query with two sources:
>>>> Kafka
>>>>>>>>> and a
>>>>>>>>>>>>>>>>>>> database.
>>>>>>>>>>>>>>>>>>>>>>>>>>> The database is sensitive to pressure, and we
>> want
>>>> to
>>>>>>>>>>>>>>> configure
>>>>>>>>>>>>>>>>>>> it to
>>>>>>>>>>>>>>>>>>>>>>>>>> not perform filter pushdown to the database
>> source.
>>>>>>>>>>>>>>>>>>>>>>>>>>> However, we still want to perform filter pushdown
>>>> to
>>>>>>> the
>>>>>>>>>>>>> Kafka
>>>>>>>>>>>>>>>>>>> source
>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>> decrease network IO.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> I propose to support configuration to disable
>>>> filter
>>>>>>>>> push
>>>>>>>>>>>>>>> down for
>>>>>>>>>>>>>>>>>>>>>>>>>> Table/SQL sources to let user decide whether to
>>>> perform
>>>>>>>>>>>>> filter
>>>>>>>>>>>>>>>>>>>>>> pushdown.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Looking forward to your feedback.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>>>> Jiabao
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>> Benchao Li
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>> Benchao Li
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>> 
>> 
>> 


Reply via email to