Thanks Venkatakrishnan for the feedback. Taking MySQL as an example, if the pushed-down filter does not hit an index, it will result in a full table scan. For a table with a large amount of data, a full table scan can consume a significant amount of CPU resources, increase response time, hold connections for a long time, and impact the overall performance of the database.
Best, Jiabao > 2023年10月28日 13:34,Venkatakrishnan Sowrirajan <vsowr...@asu.edu> 写道: > > Thanks for the proposal, Jiabao. > > I agree with Becket if a *Source* is implementing the *SupportsXXXPushDown* > (in this case *SupportsFilterPushdown*) interface, then the *Source* (in > your FLIP example which is a database) is designed to support filter > pushdown. The corresponding Source can have mechanisms built into it to > detect cases where applying the filter pushdown adds additional computation > pressure which can affect the stability of the system - if so disable it. > > Could you please elaborate on the use cases where users know upfront itself > (but not detectable at the source level), that for a specific job or SQL, > where *applyFilters *could negatively affect the overall performance of the > query or the external system or any other use cases where the ***PushDown *has > to be selectively disabled for specific sources? > > Regards > Venkata krishnan > > > On Fri, Oct 27, 2023 at 2:48 AM Jark Wu <imj...@gmail.com > <mailto:imj...@gmail.com>> wrote: > >> Hi Becket, >> >> I checked the history of " >> *table.optimizer.source.predicate-pushdown-enabled*", >> it seems it was introduced since the legacy FilterableTableSource interface >> which might be an experiential feature at that time. I don't see the >> necessity >> of this option at the moment. Maybe we can deprecate this option and drop >> it >> in Flink 2.0[1] if it is not necessary anymore. This may help to >> simplify this discussion. >> >> >> Best, >> Jark >> >> [1]: >> https://urldefense.com/v3/__https://issues.apache.org/jira/browse/FLINK-32383__;!!IKRxdwAv5BmarQ!dc-Q4Kn9OWLkpDKBZwATS0hujC6KJShXBh_sk3-W2giD8vNbfm3UdHq4mAhiXw5ITHkQSl5HYkzkCw$ >> >> >> >> On Thu, 26 Oct 2023 at 10:14, Becket Qin <becket....@gmail.com >> <mailto:becket....@gmail.com>> wrote: >> >>> Thanks for the proposal, Jiabao. My two cents below: >>> >>> 1. If I understand correctly, the motivation of the FLIP is mainly to >> make >>> predicate pushdown optional on SOME of the Sources. If so, intuitively >> the >>> configuration should be Source specific instead of general. Otherwise, we >>> will end up with general configurations that may not take effect for some >>> of the Source implementations. This violates the basic rule of a >>> configuration - it does what it says, regardless of the implementation. >>> While configuration standardization is usually a good thing, it should >> not >>> break the basic rules. >>> If we really want to have this general configuration, for the sources >> this >>> configuration does not apply, they should throw an exception to make it >>> clear that this configuration is not supported. However, that seems ugly. >>> >>> 2. I think the actual motivation of this FLIP is about "how a source >>> should implement predicate pushdown efficiently", not "whether predicate >>> pushdown should be applied to the source." For example, if a source wants >>> to avoid additional computing load in the external system, it can always >>> read the entire record and apply the predicates by itself. However, from >>> the Flink perspective, the predicate pushdown is applied, it is just >>> implemented differently by the source. So the design principle here is >> that >>> Flink only cares about whether a source supports predicate pushdown or >> not, >>> it does not care about the implementation efficiency / side effect of the >>> predicates pushdown. It is the Source implementation's responsibility to >>> ensure the predicates pushdown is implemented efficiently and does not >>> impose excessive pressure on the external system. And it is OK to have >>> additional configurations to achieve this goal. Obviously, such >>> configurations will be source specific in this case. >>> >>> 3. Regarding the existing configurations of >> *table.optimizer.source.predicate-pushdown-enabled. >>> *I am not sure why we need it. Supposedly, if a source implements a >>> SupportsXXXPushDown interface, the optimizer should push the >> corresponding >>> predicates to the Source. I am not sure in which case this configuration >>> would be used. Any ideas @Jark Wu <imj...@gmail.com >>> <mailto:imj...@gmail.com>>? >>> >>> Thanks, >>> >>> Jiangjie (Becket) Qin >>> >>> >>> On Wed, Oct 25, 2023 at 11:55 PM Jiabao Sun >>> <jiabao....@xtransfer.cn.invalid <mailto:jiabao....@xtransfer.cn.invalid>> >>> wrote: >>> >>>> Thanks Jane for the detailed explanation. >>>> >>>> I think that for users, we should respect conventions over >>>> configurations. >>>> Conventions can be default values explicitly specified in >> configurations, >>>> or they can be behaviors that follow previous versions. >>>> If the same code has different behaviors in different versions, it would >>>> be a very bad thing. >>>> >>>> I agree that for regular users, it is not necessary to understand all >> the >>>> configurations related to Flink. >>>> By following conventions, they can have a good experience. >>>> >>>> Let's get back to the practical situation and consider it. >>>> >>>> Case 1: >>>> The user is not familiar with the purpose of the >>>> table.optimizer.source.predicate-pushdown-enabled configuration but >> follows >>>> the convention of allowing predicate pushdown to the source by default. >>>> Just understanding the source.predicate-pushdown-enabled configuration >>>> and performing fine-grained toggle control will work well. >>>> >>>> Case 2: >>>> The user understands the meaning of the >>>> table.optimizer.source.predicate-pushdown-enabled configuration and has >> set >>>> its value to false. >>>> We have reason to believe that the user understands the meaning of the >>>> predicate pushdown configuration and the intention is to disable >> predicate >>>> pushdown (rather than whether or not to allow it). >>>> The previous choice of globally disabling it is likely because it >>>> couldn't be disabled on individual sources. >>>> From this perspective, if we provide more fine-grained configuration >>>> support and provide detailed explanations of the configuration >> behaviors in >>>> the documentation, >>>> users can clearly understand the differences between these two >>>> configurations and use them correctly. >>>> >>>> Also, I don't agree that >>>> table.optimizer.source.predicate-pushdown-enabled = true and >>>> source.predicate-pushdown-enabled = false means that the local >>>> configuration overrides the global configuration. >>>> On the contrary, both configurations are functioning correctly. >>>> The optimizer allows predicate pushdown to all sources, but some sources >>>> can reject the filters pushed down by the optimizer. >>>> This is natural, just like different components at different levels are >>>> responsible for different tasks. >>>> >>>> The more serious issue is that if "source.predicate-pushdown-enabled" >>>> does not respect "table.optimizer.source.predicate-pushdown-enabled”, >>>> the "table.optimizer.source.predicate-pushdown-enabled" configuration >>>> will be invalidated. >>>> This means that regardless of whether >>>> "table.optimizer.source.predicate-pushdown-enabled" is set to true or >>>> false, it will have no effect. >>>> >>>> Best, >>>> Jiabao >>>> >>>> >>>>> 2023年10月25日 22:24,Jane Chan <qingyue....@gmail.com >>>>> <mailto:qingyue....@gmail.com>> 写道: >>>>> >>>>> Hi Jiabao, >>>>> >>>>> Thanks for the in-depth clarification. Here are my cents >>>>> >>>>> However, "table.optimizer.source.predicate-pushdown-enabled" and >>>>>> "scan.filter-push-down.enabled" are configurations for different >>>>>> components(optimizer and source operator). >>>>>> >>>>> >>>>> We cannot assume that every user would be interested in understanding >>>> the >>>>> internal components of Flink, such as the optimizer or connectors, and >>>> the >>>>> specific configurations associated with each component. Instead, users >>>>> might be more concerned about knowing which configuration enables or >>>>> disables the filter push-down feature for all source connectors, and >>>> which >>>>> parameter provides the flexibility to override this behavior for a >>>> single >>>>> source if needed. >>>>> >>>>> So, from this perspective, I am inclined to divide these two >> parameters >>>>> based on the scope of their impact from the user's perspective (i.e. >>>>> global-level or operator-level), rather than categorizing them based >> on >>>> the >>>>> component hierarchy from a developer's point of view. Therefore, based >>>> on >>>>> this premise, it is intuitive and natural for users to >>>>> understand fine-grained configuration options can override global >>>>> configurations. >>>>> >>>>> Additionally, if "scan.filter-push-down.enabled" doesn't respect to >>>>>> "table.optimizer.source.predicate-pushdown-enabled" and the default >>>> value >>>>>> of "scan.filter-push-down.enabled" is defined as true, >>>>>> it means that just modifying >>>>>> "table.optimizer.source.predicate-pushdown-enabled" as false will >> have >>>> no >>>>>> effect, and filter pushdown will still be performed. >>>>>> >>>>>> If we define the default value of "scan.filter-push-down.enabled" as >>>>>> false, it would introduce a difference in behavior compared to the >>>> previous >>>>>> version. >>>>>> >>>>> >>>>> <1>If I understand correctly, "scan.filter-push-down.enabled" is a >>>>> connector option, which means the only way to configure it is to >>>> explicitly >>>>> specify it in DDL (no matter whether disable or enable), and the SET >>>>> command is not applicable, so I think it's natural to still respect >>>> user's >>>>> specification here. Otherwise, users might be more confused about why >>>> the >>>>> DDL does not work as expected, and the reason is just because some >> other >>>>> "optimizer" configuration is set to a different value. >>>>> >>>>> <2> From the implementation side, I am inclined to keep the >> parameter's >>>>> priority consistent for all conditions. >>>>> >>>>> Let "global" denote >> "table.optimizer.source.predicate-pushdown-enabled", >>>>> and let "per-source" denote "scan.filter-push-down.enabled" for >> specific >>>>> source T, the following Truth table (based on the current design) >>>>> indicates the inconsistent behavior for "per-source override global". >>>>> >>>>> .------------.---------------.------------------- >>>>> ----.-------------------------------------. >>>>> | global | per-source | push-down for T | per-source override >> global | >>>>> >>>> >> :-----------+--------------+-----------------------+------------------------------------: >>>>> | true | false | false | Y >>>>> | >>>>> >>>> >> :-----------+--------------+-----------------------+------------------------------------: >>>>> | false | true | false | N >>>>> | >>>>> >>>> >> .------------.---------------.-----------------------.-------------------------------------. >>>>> >>>>> Best, >>>>> Jane >>>>> >>>>> On Wed, Oct 25, 2023 at 6:22 PM Jiabao Sun <jiabao....@xtransfer.cn >>>>> <mailto:jiabao....@xtransfer.cn> >>>> .invalid> >>>>> wrote: >>>>> >>>>>> Thanks Benchao for the feedback. >>>>>> >>>>>> I understand that the configuration of global parallelism and task >>>>>> parallelism is at different granularities but with the same >>>> configuration. >>>>>> However, "table.optimizer.source.predicate-pushdown-enabled" and >>>>>> "scan.filter-push-down.enabled" are configurations for different >>>>>> components(optimizer and source operator). >>>>>> >>>>>> From a user's perspective, there are two scenarios: >>>>>> >>>>>> 1. Disabling all filter pushdown >>>>>> In this case, setting >>>> "table.optimizer.source.predicate-pushdown-enabled" >>>>>> to false is sufficient to meet the requirement. >>>>>> >>>>>> 2. Disabling filter pushdown for specific sources >>>>>> In this scenario, there is no need to adjust the value of >>>>>> "table.optimizer.source.predicate-pushdown-enabled". >>>>>> Instead, the focus should be on the configuration of >>>>>> "scan.filter-push-down.enabled" to meet the requirement. >>>>>> In this case, users do not need to set >>>>>> "table.optimizer.source.predicate-pushdown-enabled" to false and >>>> manually >>>>>> enable filter pushdown for specific sources. >>>>>> >>>>>> Additionally, if "scan.filter-push-down.enabled" doesn't respect to >>>>>> "table.optimizer.source.predicate-pushdown-enabled" and the default >>>> value >>>>>> of "scan.filter-push-down.enabled" is defined as true, >>>>>> it means that just modifying >>>>>> "table.optimizer.source.predicate-pushdown-enabled" as false will >> have >>>> no >>>>>> effect, and filter pushdown will still be performed. >>>>>> >>>>>> If we define the default value of "scan.filter-push-down.enabled" as >>>>>> false, it would introduce a difference in behavior compared to the >>>> previous >>>>>> version. >>>>>> The same SQL query that could successfully push down filters in the >> old >>>>>> version but would no longer do so after the upgrade. >>>>>> >>>>>> Best, >>>>>> Jiabao >>>>>> >>>>>> >>>>>>> 2023年10月25日 17:10,Benchao Li <libenc...@apache.org >>>>>>> <mailto:libenc...@apache.org>> 写道: >>>>>>> >>>>>>> Thanks Jiabao for the detailed explanations, that helps a lot, I >>>>>>> understand your rationale now. >>>>>>> >>>>>>> Correct me if I'm wrong. Your perspective is from "developer", which >>>>>>> means there is an optimizer and connector component, and if we want >> to >>>>>>> enable this feature (pushing filters down into connectors), you must >>>>>>> enable it firstly in optimizer, and only then connector has the >> chance >>>>>>> to decide to use it or not. >>>>>>> >>>>>>> My perspective is from "user" that (Why a user should care about the >>>>>>> difference of optimizer/connector) , this is a feature, and has two >>>>>>> way to control it, one way is to config it job-level, the other one >> is >>>>>>> in table properties. What a user expects is that they can control a >>>>>>> feature in a tiered way, that setting it per job, and then >>>>>>> fine-grained tune it per table. >>>>>>> >>>>>>> This is some kind of similar to other concepts, such as parallelism, >>>>>>> users can set a job level default parallelism, and then fine-grained >>>>>>> tune it per operator. There may be more such debate in the future >>>>>>> e.g., we can have a job level config about adding key-by before >> lookup >>>>>>> join, and also a hint/table property way to fine-grained control it >>>>>>> per lookup operator. Hence we'd better find a unified way for all >>>>>>> those similar kind of features. >>>>>>> >>>>>>> Jiabao Sun <jiabao....@xtransfer.cn.invalid >>>>>>> <mailto:jiabao....@xtransfer.cn.invalid>> 于2023年10月25日周三 >> 15:27写道: >>>>>>>> >>>>>>>> Thanks Jane for further explanation. >>>>>>>> >>>>>>>> These two configurations correspond to different levels. >>>>>> "scan.filter-push-down.enabled" does not make >>>>>> "table.optimizer.source.predicate" invalid. >>>>>>>> The planner will still push down predicates to all sources. >>>>>>>> Whether filter pushdown is allowed or not is determined by the >>>> specific >>>>>> source's "scan.filter-push-down.enabled" configuration. >>>>>>>> >>>>>>>> However, "table.optimizer.source.predicate" does directly affect >>>>>> "scan.filter-push-down.enabled”. >>>>>>>> When the planner disables predicate pushdown, the source-level >> filter >>>>>> pushdown will also not be executed, even if the source allows filter >>>>>> pushdown. >>>>>>>> >>>>>>>> Whatever, in point 1 and 2, our expectation is consistent. >>>>>>>> For the 3rd point, I still think that the planner-level >> configuration >>>>>> takes precedence over the source-level configuration. >>>>>>>> It may seem counterintuitive when we globally disable predicate >>>>>> pushdown but allow filter pushdown at the source level. >>>>>>>> >>>>>>>> Best, >>>>>>>> Jiabao >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> 2023年10月25日 14:35,Jane Chan <qingyue....@gmail.com >>>>>>>>> <mailto:qingyue....@gmail.com>> 写道: >>>>>>>>> >>>>>>>>> Hi Jiabao, >>>>>>>>> >>>>>>>>> Thanks for clarifying this. While by >> "scan.filter-push-down.enabled >>>>>> takes a >>>>>>>>> higher priority" I meant that this value should be respected >>>> whenever >>>>>> it is >>>>>>>>> set explicitly. >>>>>>>>> >>>>>>>>> The conclusion that >>>>>>>>> >>>>>>>>> 2. "table.optimizer.source.predicate" = "true" and >>>>>>>>>> "scan.filter-push-down.enabled" = "false" >>>>>>>>>> Allow the planner to perform predicate pushdown, but individual >>>>>> sources do >>>>>>>>>> not enable filter pushdown. >>>>>>>>>> >>>>>>>>> >>>>>>>>> This indicates that the option "scan.filter-push-down.enabled = >>>> false" >>>>>> for >>>>>>>>> an individual source connector does indeed override the >> global-level >>>>>>>>> planner settings to make a difference. And thus "has a higher >>>>>> priority". >>>>>>>>> >>>>>>>>> While for >>>>>>>>> >>>>>>>>> 3. "table.optimizer.source.predicate" = "false" >>>>>>>>>> Predicate pushdown is not allowed for the planner. >>>>>>>>>> Regardless of the value of the "scan.filter-push-down.enabled" >>>>>>>>>> configuration, filter pushdown is disabled. >>>>>>>>>> In this scenario, the behavior remains consistent with the old >>>>>> version as >>>>>>>>>> well. >>>>>>>>>> >>>>>>>>> >>>>>>>>> I still think "scan.filter-push-down.enabled" should also be >>>> respected >>>>>> if >>>>>>>>> it is enabled for individual connectors. WDYT? >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Jane >>>>>>>>> >>>>>>>>> On Wed, Oct 25, 2023 at 1:27 PM Jiabao Sun < >> jiabao....@xtransfer.cn <mailto:jiabao....@xtransfer.cn> >>>>>> .invalid> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Thanks Benchao for the feedback. >>>>>>>>>> >>>>>>>>>> For the current proposal, we recommend keeping the default value >> of >>>>>>>>>> "table.optimizer.source.predicate" as true, >>>>>>>>>> and setting the the default value of newly introduced option >>>>>>>>>> "scan.filter-push-down.enabled" to true as well. >>>>>>>>>> >>>>>>>>>> The main purpose of doing this is to maintain consistency with >>>>>> previous >>>>>>>>>> versions, as whether to perform >>>>>>>>>> filter pushdown in the old version solely depends on the >>>>>>>>>> "table.optimizer.source.predicate" option. >>>>>>>>>> That means by default, as long as a TableSource implements the >>>>>>>>>> SupportsFilterPushDown interface, filter pushdown is allowed. >>>>>>>>>> And it seems that we don't have much benefit in changing the >>>> default >>>>>> value >>>>>>>>>> of "table.optimizer.source.predicate" to false. >>>>>>>>>> >>>>>>>>>> Regarding the priority of these two configurations, I believe >> that >>>>>>>>>> "table.optimizer.source.predicate" >>>>>>>>>> takes precedence over "scan.filter-push-down.enabled" and it >>>> exhibits >>>>>> the >>>>>>>>>> following behavior. >>>>>>>>>> >>>>>>>>>> 1. "table.optimizer.source.predicate" = "true" and >>>>>>>>>> "scan.filter-push-down.enabled" = "true" >>>>>>>>>> This is the default behavior, allowing filter pushdown for >> sources. >>>>>>>>>> >>>>>>>>>> 2. "table.optimizer.source.predicate" = "true" and >>>>>>>>>> "scan.filter-push-down.enabled" = "false" >>>>>>>>>> Allow the planner to perform predicate pushdown, but individual >>>>>> sources do >>>>>>>>>> not enable filter pushdown. >>>>>>>>>> >>>>>>>>>> 3. "table.optimizer.source.predicate" = "false" >>>>>>>>>> Predicate pushdown is not allowed for the planner. >>>>>>>>>> Regardless of the value of the "scan.filter-push-down.enabled" >>>>>>>>>> configuration, filter pushdown is disabled. >>>>>>>>>> In this scenario, the behavior remains consistent with the old >>>>>> version as >>>>>>>>>> well. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> From an implementation perspective, setting the priority of >>>>>>>>>> "scan.filter-push-down.enabled" higher than >>>>>>>>>> "table.optimizer.source.predicate" is difficult to achieve now. >>>>>>>>>> Because the PushFilterIntoSourceScanRuleBase at the planner level >>>>>> takes >>>>>>>>>> precedence over the source-level FilterPushDownSpec. >>>>>>>>>> Only when the PushFilterIntoSourceScanRuleBase is enabled, will >> the >>>>>>>>>> Source-level filter pushdown be performed. >>>>>>>>>> >>>>>>>>>> Additionally, in my opinion, there doesn't seem to be much >> benefit >>>> in >>>>>>>>>> setting a higher priority for "scan.filter-push-down.enabled". >>>>>>>>>> It may instead affect compatibility and increase implementation >>>>>> complexity. >>>>>>>>>> >>>>>>>>>> WDYT? >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Jiabao >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> 2023年10月25日 11:56,Benchao Li <libenc...@apache.org >>>>>>>>>>> <mailto:libenc...@apache.org>> 写道: >>>>>>>>>>> >>>>>>>>>>> I agree with Jane that fine-grained configurations should have >>>> higher >>>>>>>>>>> priority than job level configurations. >>>>>>>>>>> >>>>>>>>>>> For current proposal, we can achieve that: >>>>>>>>>>> - Set "table.optimizer.source.predicate" = "true" to enable by >>>>>>>>>>> default, and set ""scan.filter-push-down.enabled" = "false" to >>>>>> disable >>>>>>>>>>> it per table source >>>>>>>>>>> - Set "table.optimizer.source.predicate" = "false" to disable by >>>>>>>>>>> default, and set ""scan.filter-push-down.enabled" = "true" to >>>> enable >>>>>>>>>>> it per table source >>>>>>>>>>> >>>>>>>>>>> Jane Chan <qingyue....@gmail.com <mailto:qingyue....@gmail.com>> >>>>>>>>>>> 于2023年10月24日周二 23:55写道: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I believe that the configuration >>>> "table.optimizer.source.predicate" >>>>>>>>>> has a >>>>>>>>>>>>> higher priority at the planner level than the configuration at >>>> the >>>>>>>>>> source >>>>>>>>>>>>> level, >>>>>>>>>>>>> and it seems easy to implement now. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Correct me if I'm wrong, but I think the fine-grained >>>> configuration >>>>>>>>>>>> "scan.filter-push-down.enabled" should have a higher priority >>>>>> because >>>>>>>>>> the >>>>>>>>>>>> default value of "table.optimizer.source.predicate" is true. >> As a >>>>>>>>>> result, >>>>>>>>>>>> turning off filter push-down for a specific source will not >> take >>>>>> effect >>>>>>>>>>>> unless the default value of "table.optimizer.source.predicate" >> is >>>>>>>>>> changed >>>>>>>>>>>> to false, or, alternatively, let users manually set >>>>>>>>>>>> "table.optimizer.source.predicate" to false first and then >>>>>> selectively >>>>>>>>>>>> enable filter push-down for the desired sources, which is less >>>>>>>>>> intuitive. >>>>>>>>>>>> WDYT? >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> Jane >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Oct 24, 2023 at 6:05 PM Jiabao Sun < >>>> jiabao....@xtransfer.cn <mailto:jiabao....@xtransfer.cn> >>>>>>>>>> .invalid> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Thanks Jane, >>>>>>>>>>>>> >>>>>>>>>>>>> I believe that the configuration >>>> "table.optimizer.source.predicate" >>>>>>>>>> has a >>>>>>>>>>>>> higher priority at the planner level than the configuration at >>>> the >>>>>>>>>> source >>>>>>>>>>>>> level, >>>>>>>>>>>>> and it seems easy to implement now. >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Jiabao >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> 2023年10月24日 17:36,Jane Chan <qingyue....@gmail.com >>>>>>>>>>>>>> <mailto:qingyue....@gmail.com>> 写道: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Jiabao, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for driving this discussion. I have a small question >>>> that >>>>>> will >>>>>>>>>>>>>> "scan.filter-push-down.enabled" take precedence over >>>>>>>>>>>>>> "table.optimizer.source.predicate" when the two parameters >>>> might >>>>>>>>>> conflict >>>>>>>>>>>>>> each other? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best, >>>>>>>>>>>>>> Jane >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Oct 24, 2023 at 5:05 PM Jiabao Sun < >>>>>> jiabao....@xtransfer.cn <mailto:jiabao....@xtransfer.cn> >>>>>>>>>>>>> .invalid> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks Jark, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> If we only add configuration without adding the >>>>>> enableFilterPushDown >>>>>>>>>>>>>>> method in the SupportsFilterPushDown interface, >>>>>>>>>>>>>>> each connector would have to handle the same logic in the >>>>>>>>>> applyFilters >>>>>>>>>>>>>>> method to determine whether filter pushdown is needed. >>>>>>>>>>>>>>> This would increase complexity and violate the original >>>> behavior >>>>>> of >>>>>>>>>> the >>>>>>>>>>>>>>> applyFilters method. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On the contrary, we only need to pass the configuration >>>>>> parameter in >>>>>>>>>> the >>>>>>>>>>>>>>> newly added enableFilterPushDown method >>>>>>>>>>>>>>> to decide whether to perform predicate pushdown. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I think this approach would be clearer and simpler. >>>>>>>>>>>>>>> WDYT? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>> Jiabao >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2023年10月24日 16:58,Jark Wu <imj...@gmail.com >>>>>>>>>>>>>>>> <mailto:imj...@gmail.com>> 写道: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi JIabao, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I think the current interface can already satisfy your >>>>>> requirements. >>>>>>>>>>>>>>>> The connector can reject all the filters by returning the >>>> input >>>>>>>>>> filters >>>>>>>>>>>>>>>> as `Result#remainingFilters`. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> So maybe we don't need to introduce a new method to disable >>>>>>>>>>>>>>>> pushdown, but just introduce an option for the specific >>>>>> connector. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>> Jark >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, 24 Oct 2023 at 16:38, Leonard Xu < >> xbjt...@gmail.com <mailto:xbjt...@gmail.com>> >>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks @Jiabao for kicking off this discussion. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Could you add a section to explain the difference between >>>>>> proposed >>>>>>>>>>>>>>>>> connector level config `scan.filter-push-down.enabled` and >>>>>> existing >>>>>>>>>>>>>>> query >>>>>>>>>>>>>>>>> level config >>>>>> `table.optimizer.source.predicate-pushdown-enabled` ? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>> Leonard >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 2023年10月24日 下午4:18,Jiabao Sun <jiabao....@xtransfer.cn >>>>>>>>>>>>>>>>>> <mailto:jiabao....@xtransfer.cn> >>>>>> .INVALID> >>>>>>>>>> 写道: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi Devs, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I would like to start a discussion on FLIP-377: support >>>>>>>>>> configuration >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> disable filter pushdown for Table/SQL Sources[1]. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Currently, Flink Table/SQL does not expose fine-grained >>>>>> control >>>>>>>>>> for >>>>>>>>>>>>>>>>> users to enable or disable filter pushdown. >>>>>>>>>>>>>>>>>> However, filter pushdown has some side effects, such as >>>>>> additional >>>>>>>>>>>>>>>>> computational pressure on external systems. >>>>>>>>>>>>>>>>>> Moreover, Improper queries can lead to issues such as >> full >>>>>> table >>>>>>>>>>>>> scans, >>>>>>>>>>>>>>>>> which in turn can impact the stability of external >> systems. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Suppose we have an SQL query with two sources: Kafka and >> a >>>>>>>>>> database. >>>>>>>>>>>>>>>>>> The database is sensitive to pressure, and we want to >>>>>> configure >>>>>>>>>> it to >>>>>>>>>>>>>>>>> not perform filter pushdown to the database source. >>>>>>>>>>>>>>>>>> However, we still want to perform filter pushdown to the >>>> Kafka >>>>>>>>>> source >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> decrease network IO. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I propose to support configuration to disable filter push >>>>>> down for >>>>>>>>>>>>>>>>> Table/SQL sources to let user decide whether to perform >>>> filter >>>>>>>>>>>>> pushdown. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Looking forward to your feedback. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768__;!!IKRxdwAv5BmarQ!dc-Q4Kn9OWLkpDKBZwATS0hujC6KJShXBh_sk3-W2giD8vNbfm3UdHq4mAhiXw5ITHkQSl4D3HTulQ$ >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>> Jiabao >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Benchao Li >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Best, >>>>>>> Benchao Li