Thanks for the improvements, Jiabao. There are some details that I am not sure about. 1. The new option `source.filter-push-down.enabled` will be added to which class? I think it should be `SourceReaderOptions`. 2. How are the connector developers able to know and follow the FLIP? Do we need an abstract base class or provide a default method?
Best, Hang Jiabao Sun <jiabao....@xtransfer.cn.invalid> 于2023年10月30日周一 14:45写道: > Hi, all, > > Thanks for the lively discussion. > > Based on the discussion, I have made some adjustments to the FLIP document: > > 1. The name of the newly added option has been changed to > "source.filter-push-down.enabled". > 2. Considering compatibility with older versions, the newly added > "source.filter-push-down.enabled" option needs to respect the optimizer's > "table.optimizer.source.predicate-pushdown-enabled" option. > But there is a consideration to remove the old option in Flink 2.0. > 3. We can provide more options to disable other source abilities with side > effects, such as “source.aggregate.enabled” and “source.projection.enabled" > This is not urgent and can be continuously introduced. > > Looking forward to your feedback again. > > Best, > Jiabao > > > > 2023年10月29日 08:45,Becket Qin <becket....@gmail.com> 写道: > > > > Thanks for digging into the git history, Jark. I agree it makes sense to > > deprecate this API in 2.0. > > > > Cheers, > > > > Jiangjie (Becket) Qin > > > > On Fri, Oct 27, 2023 at 5:47 PM Jark Wu <imj...@gmail.com> wrote: > > > >> Hi Becket, > >> > >> I checked the history of " > >> *table.optimizer.source.predicate-pushdown-enabled*", > >> it seems it was introduced since the legacy FilterableTableSource > >> interface > >> which might be an experiential feature at that time. I don't see the > >> necessity > >> of this option at the moment. Maybe we can deprecate this option and > drop > >> it > >> in Flink 2.0[1] if it is not necessary anymore. This may help to > >> simplify this discussion. > >> > >> > >> Best, > >> Jark > >> > >> [1]: https://issues.apache.org/jira/browse/FLINK-32383 > >> > >> > >> > >> On Thu, 26 Oct 2023 at 10:14, Becket Qin <becket....@gmail.com> wrote: > >> > >>> Thanks for the proposal, Jiabao. My two cents below: > >>> > >>> 1. If I understand correctly, the motivation of the FLIP is mainly to > >>> make predicate pushdown optional on SOME of the Sources. If so, > intuitively > >>> the configuration should be Source specific instead of general. > Otherwise, > >>> we will end up with general configurations that may not take effect for > >>> some of the Source implementations. This violates the basic rule of a > >>> configuration - it does what it says, regardless of the implementation. > >>> While configuration standardization is usually a good thing, it should > not > >>> break the basic rules. > >>> If we really want to have this general configuration, for the sources > >>> this configuration does not apply, they should throw an exception to > make > >>> it clear that this configuration is not supported. However, that seems > ugly. > >>> > >>> 2. I think the actual motivation of this FLIP is about "how a source > >>> should implement predicate pushdown efficiently", not "whether > predicate > >>> pushdown should be applied to the source." For example, if a source > wants > >>> to avoid additional computing load in the external system, it can > always > >>> read the entire record and apply the predicates by itself. However, > from > >>> the Flink perspective, the predicate pushdown is applied, it is just > >>> implemented differently by the source. So the design principle here is > that > >>> Flink only cares about whether a source supports predicate pushdown or > not, > >>> it does not care about the implementation efficiency / side effect of > the > >>> predicates pushdown. It is the Source implementation's responsibility > to > >>> ensure the predicates pushdown is implemented efficiently and does not > >>> impose excessive pressure on the external system. And it is OK to have > >>> additional configurations to achieve this goal. Obviously, such > >>> configurations will be source specific in this case. > >>> > >>> 3. Regarding the existing configurations of > *table.optimizer.source.predicate-pushdown-enabled. > >>> *I am not sure why we need it. Supposedly, if a source implements a > >>> SupportsXXXPushDown interface, the optimizer should push the > corresponding > >>> predicates to the Source. I am not sure in which case this > configuration > >>> would be used. Any ideas @Jark Wu <imj...@gmail.com>? > >>> > >>> Thanks, > >>> > >>> Jiangjie (Becket) Qin > >>> > >>> > >>> On Wed, Oct 25, 2023 at 11:55 PM Jiabao Sun > >>> <jiabao....@xtransfer.cn.invalid> wrote: > >>> > >>>> Thanks Jane for the detailed explanation. > >>>> > >>>> I think that for users, we should respect conventions over > >>>> configurations. > >>>> Conventions can be default values explicitly specified in > >>>> configurations, or they can be behaviors that follow previous > versions. > >>>> If the same code has different behaviors in different versions, it > would > >>>> be a very bad thing. > >>>> > >>>> I agree that for regular users, it is not necessary to understand all > >>>> the configurations related to Flink. > >>>> By following conventions, they can have a good experience. > >>>> > >>>> Let's get back to the practical situation and consider it. > >>>> > >>>> Case 1: > >>>> The user is not familiar with the purpose of the > >>>> table.optimizer.source.predicate-pushdown-enabled configuration but > follows > >>>> the convention of allowing predicate pushdown to the source by > default. > >>>> Just understanding the source.predicate-pushdown-enabled configuration > >>>> and performing fine-grained toggle control will work well. > >>>> > >>>> Case 2: > >>>> The user understands the meaning of the > >>>> table.optimizer.source.predicate-pushdown-enabled configuration and > has set > >>>> its value to false. > >>>> We have reason to believe that the user understands the meaning of the > >>>> predicate pushdown configuration and the intention is to disable > predicate > >>>> pushdown (rather than whether or not to allow it). > >>>> The previous choice of globally disabling it is likely because it > >>>> couldn't be disabled on individual sources. > >>>> From this perspective, if we provide more fine-grained configuration > >>>> support and provide detailed explanations of the configuration > behaviors in > >>>> the documentation, > >>>> users can clearly understand the differences between these two > >>>> configurations and use them correctly. > >>>> > >>>> Also, I don't agree that > >>>> table.optimizer.source.predicate-pushdown-enabled = true and > >>>> source.predicate-pushdown-enabled = false means that the local > >>>> configuration overrides the global configuration. > >>>> On the contrary, both configurations are functioning correctly. > >>>> The optimizer allows predicate pushdown to all sources, but some > sources > >>>> can reject the filters pushed down by the optimizer. > >>>> This is natural, just like different components at different levels > are > >>>> responsible for different tasks. > >>>> > >>>> The more serious issue is that if "source.predicate-pushdown-enabled" > >>>> does not respect "table.optimizer.source.predicate-pushdown-enabled”, > >>>> the "table.optimizer.source.predicate-pushdown-enabled" configuration > >>>> will be invalidated. > >>>> This means that regardless of whether > >>>> "table.optimizer.source.predicate-pushdown-enabled" is set to true or > >>>> false, it will have no effect. > >>>> > >>>> Best, > >>>> Jiabao > >>>> > >>>> > >>>>> 2023年10月25日 22:24,Jane Chan <qingyue....@gmail.com> 写道: > >>>>> > >>>>> Hi Jiabao, > >>>>> > >>>>> Thanks for the in-depth clarification. Here are my cents > >>>>> > >>>>> However, "table.optimizer.source.predicate-pushdown-enabled" and > >>>>>> "scan.filter-push-down.enabled" are configurations for different > >>>>>> components(optimizer and source operator). > >>>>>> > >>>>> > >>>>> We cannot assume that every user would be interested in understanding > >>>> the > >>>>> internal components of Flink, such as the optimizer or connectors, > and > >>>> the > >>>>> specific configurations associated with each component. Instead, > users > >>>>> might be more concerned about knowing which configuration enables or > >>>>> disables the filter push-down feature for all source connectors, and > >>>> which > >>>>> parameter provides the flexibility to override this behavior for a > >>>> single > >>>>> source if needed. > >>>>> > >>>>> So, from this perspective, I am inclined to divide these two > parameters > >>>>> based on the scope of their impact from the user's perspective (i.e. > >>>>> global-level or operator-level), rather than categorizing them based > >>>> on the > >>>>> component hierarchy from a developer's point of view. Therefore, > based > >>>> on > >>>>> this premise, it is intuitive and natural for users to > >>>>> understand fine-grained configuration options can override global > >>>>> configurations. > >>>>> > >>>>> Additionally, if "scan.filter-push-down.enabled" doesn't respect to > >>>>>> "table.optimizer.source.predicate-pushdown-enabled" and the default > >>>> value > >>>>>> of "scan.filter-push-down.enabled" is defined as true, > >>>>>> it means that just modifying > >>>>>> "table.optimizer.source.predicate-pushdown-enabled" as false will > >>>> have no > >>>>>> effect, and filter pushdown will still be performed. > >>>>>> > >>>>>> If we define the default value of "scan.filter-push-down.enabled" as > >>>>>> false, it would introduce a difference in behavior compared to the > >>>> previous > >>>>>> version. > >>>>>> > >>>>> > >>>>> <1>If I understand correctly, "scan.filter-push-down.enabled" is a > >>>>> connector option, which means the only way to configure it is to > >>>> explicitly > >>>>> specify it in DDL (no matter whether disable or enable), and the SET > >>>>> command is not applicable, so I think it's natural to still respect > >>>> user's > >>>>> specification here. Otherwise, users might be more confused about why > >>>> the > >>>>> DDL does not work as expected, and the reason is just because some > >>>> other > >>>>> "optimizer" configuration is set to a different value. > >>>>> > >>>>> <2> From the implementation side, I am inclined to keep the > parameter's > >>>>> priority consistent for all conditions. > >>>>> > >>>>> Let "global" denote > >>>> "table.optimizer.source.predicate-pushdown-enabled", > >>>>> and let "per-source" denote "scan.filter-push-down.enabled" for > >>>> specific > >>>>> source T, the following Truth table (based on the current design) > >>>>> indicates the inconsistent behavior for "per-source override global". > >>>>> > >>>>> .------------.---------------.------------------- > >>>>> ----.-------------------------------------. > >>>>> | global | per-source | push-down for T | per-source override > global > >>>> | > >>>>> > >>>> > :-----------+--------------+-----------------------+------------------------------------: > >>>>> | true | false | false | Y > >>>>> | > >>>>> > >>>> > :-----------+--------------+-----------------------+------------------------------------: > >>>>> | false | true | false | N > >>>>> | > >>>>> > >>>> > .------------.---------------.-----------------------.-------------------------------------. > >>>>> > >>>>> Best, > >>>>> Jane > >>>>> > >>>>> On Wed, Oct 25, 2023 at 6:22 PM Jiabao Sun <jiabao....@xtransfer.cn > >>>> .invalid> > >>>>> wrote: > >>>>> > >>>>>> Thanks Benchao for the feedback. > >>>>>> > >>>>>> I understand that the configuration of global parallelism and task > >>>>>> parallelism is at different granularities but with the same > >>>> configuration. > >>>>>> However, "table.optimizer.source.predicate-pushdown-enabled" and > >>>>>> "scan.filter-push-down.enabled" are configurations for different > >>>>>> components(optimizer and source operator). > >>>>>> > >>>>>> From a user's perspective, there are two scenarios: > >>>>>> > >>>>>> 1. Disabling all filter pushdown > >>>>>> In this case, setting > >>>> "table.optimizer.source.predicate-pushdown-enabled" > >>>>>> to false is sufficient to meet the requirement. > >>>>>> > >>>>>> 2. Disabling filter pushdown for specific sources > >>>>>> In this scenario, there is no need to adjust the value of > >>>>>> "table.optimizer.source.predicate-pushdown-enabled". > >>>>>> Instead, the focus should be on the configuration of > >>>>>> "scan.filter-push-down.enabled" to meet the requirement. > >>>>>> In this case, users do not need to set > >>>>>> "table.optimizer.source.predicate-pushdown-enabled" to false and > >>>> manually > >>>>>> enable filter pushdown for specific sources. > >>>>>> > >>>>>> Additionally, if "scan.filter-push-down.enabled" doesn't respect to > >>>>>> "table.optimizer.source.predicate-pushdown-enabled" and the default > >>>> value > >>>>>> of "scan.filter-push-down.enabled" is defined as true, > >>>>>> it means that just modifying > >>>>>> "table.optimizer.source.predicate-pushdown-enabled" as false will > >>>> have no > >>>>>> effect, and filter pushdown will still be performed. > >>>>>> > >>>>>> If we define the default value of "scan.filter-push-down.enabled" as > >>>>>> false, it would introduce a difference in behavior compared to the > >>>> previous > >>>>>> version. > >>>>>> The same SQL query that could successfully push down filters in the > >>>> old > >>>>>> version but would no longer do so after the upgrade. > >>>>>> > >>>>>> Best, > >>>>>> Jiabao > >>>>>> > >>>>>> > >>>>>>> 2023年10月25日 17:10,Benchao Li <libenc...@apache.org> 写道: > >>>>>>> > >>>>>>> Thanks Jiabao for the detailed explanations, that helps a lot, I > >>>>>>> understand your rationale now. > >>>>>>> > >>>>>>> Correct me if I'm wrong. Your perspective is from "developer", > which > >>>>>>> means there is an optimizer and connector component, and if we want > >>>> to > >>>>>>> enable this feature (pushing filters down into connectors), you > must > >>>>>>> enable it firstly in optimizer, and only then connector has the > >>>> chance > >>>>>>> to decide to use it or not. > >>>>>>> > >>>>>>> My perspective is from "user" that (Why a user should care about > the > >>>>>>> difference of optimizer/connector) , this is a feature, and has two > >>>>>>> way to control it, one way is to config it job-level, the other one > >>>> is > >>>>>>> in table properties. What a user expects is that they can control a > >>>>>>> feature in a tiered way, that setting it per job, and then > >>>>>>> fine-grained tune it per table. > >>>>>>> > >>>>>>> This is some kind of similar to other concepts, such as > parallelism, > >>>>>>> users can set a job level default parallelism, and then > fine-grained > >>>>>>> tune it per operator. There may be more such debate in the future > >>>>>>> e.g., we can have a job level config about adding key-by before > >>>> lookup > >>>>>>> join, and also a hint/table property way to fine-grained control it > >>>>>>> per lookup operator. Hence we'd better find a unified way for all > >>>>>>> those similar kind of features. > >>>>>>> > >>>>>>> Jiabao Sun <jiabao....@xtransfer.cn.invalid> 于2023年10月25日周三 > 15:27写道: > >>>>>>>> > >>>>>>>> Thanks Jane for further explanation. > >>>>>>>> > >>>>>>>> These two configurations correspond to different levels. > >>>>>> "scan.filter-push-down.enabled" does not make > >>>>>> "table.optimizer.source.predicate" invalid. > >>>>>>>> The planner will still push down predicates to all sources. > >>>>>>>> Whether filter pushdown is allowed or not is determined by the > >>>> specific > >>>>>> source's "scan.filter-push-down.enabled" configuration. > >>>>>>>> > >>>>>>>> However, "table.optimizer.source.predicate" does directly affect > >>>>>> "scan.filter-push-down.enabled”. > >>>>>>>> When the planner disables predicate pushdown, the source-level > >>>> filter > >>>>>> pushdown will also not be executed, even if the source allows filter > >>>>>> pushdown. > >>>>>>>> > >>>>>>>> Whatever, in point 1 and 2, our expectation is consistent. > >>>>>>>> For the 3rd point, I still think that the planner-level > >>>> configuration > >>>>>> takes precedence over the source-level configuration. > >>>>>>>> It may seem counterintuitive when we globally disable predicate > >>>>>> pushdown but allow filter pushdown at the source level. > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Jiabao > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> 2023年10月25日 14:35,Jane Chan <qingyue....@gmail.com> 写道: > >>>>>>>>> > >>>>>>>>> Hi Jiabao, > >>>>>>>>> > >>>>>>>>> Thanks for clarifying this. While by > "scan.filter-push-down.enabled > >>>>>> takes a > >>>>>>>>> higher priority" I meant that this value should be respected > >>>> whenever > >>>>>> it is > >>>>>>>>> set explicitly. > >>>>>>>>> > >>>>>>>>> The conclusion that > >>>>>>>>> > >>>>>>>>> 2. "table.optimizer.source.predicate" = "true" and > >>>>>>>>>> "scan.filter-push-down.enabled" = "false" > >>>>>>>>>> Allow the planner to perform predicate pushdown, but individual > >>>>>> sources do > >>>>>>>>>> not enable filter pushdown. > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> This indicates that the option "scan.filter-push-down.enabled = > >>>> false" > >>>>>> for > >>>>>>>>> an individual source connector does indeed override the > >>>> global-level > >>>>>>>>> planner settings to make a difference. And thus "has a higher > >>>>>> priority". > >>>>>>>>> > >>>>>>>>> While for > >>>>>>>>> > >>>>>>>>> 3. "table.optimizer.source.predicate" = "false" > >>>>>>>>>> Predicate pushdown is not allowed for the planner. > >>>>>>>>>> Regardless of the value of the "scan.filter-push-down.enabled" > >>>>>>>>>> configuration, filter pushdown is disabled. > >>>>>>>>>> In this scenario, the behavior remains consistent with the old > >>>>>> version as > >>>>>>>>>> well. > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> I still think "scan.filter-push-down.enabled" should also be > >>>> respected > >>>>>> if > >>>>>>>>> it is enabled for individual connectors. WDYT? > >>>>>>>>> > >>>>>>>>> Best, > >>>>>>>>> Jane > >>>>>>>>> > >>>>>>>>> On Wed, Oct 25, 2023 at 1:27 PM Jiabao Sun < > >>>> jiabao....@xtransfer.cn > >>>>>> .invalid> > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Thanks Benchao for the feedback. > >>>>>>>>>> > >>>>>>>>>> For the current proposal, we recommend keeping the default value > >>>> of > >>>>>>>>>> "table.optimizer.source.predicate" as true, > >>>>>>>>>> and setting the the default value of newly introduced option > >>>>>>>>>> "scan.filter-push-down.enabled" to true as well. > >>>>>>>>>> > >>>>>>>>>> The main purpose of doing this is to maintain consistency with > >>>>>> previous > >>>>>>>>>> versions, as whether to perform > >>>>>>>>>> filter pushdown in the old version solely depends on the > >>>>>>>>>> "table.optimizer.source.predicate" option. > >>>>>>>>>> That means by default, as long as a TableSource implements the > >>>>>>>>>> SupportsFilterPushDown interface, filter pushdown is allowed. > >>>>>>>>>> And it seems that we don't have much benefit in changing the > >>>> default > >>>>>> value > >>>>>>>>>> of "table.optimizer.source.predicate" to false. > >>>>>>>>>> > >>>>>>>>>> Regarding the priority of these two configurations, I believe > that > >>>>>>>>>> "table.optimizer.source.predicate" > >>>>>>>>>> takes precedence over "scan.filter-push-down.enabled" and it > >>>> exhibits > >>>>>> the > >>>>>>>>>> following behavior. > >>>>>>>>>> > >>>>>>>>>> 1. "table.optimizer.source.predicate" = "true" and > >>>>>>>>>> "scan.filter-push-down.enabled" = "true" > >>>>>>>>>> This is the default behavior, allowing filter pushdown for > >>>> sources. > >>>>>>>>>> > >>>>>>>>>> 2. "table.optimizer.source.predicate" = "true" and > >>>>>>>>>> "scan.filter-push-down.enabled" = "false" > >>>>>>>>>> Allow the planner to perform predicate pushdown, but individual > >>>>>> sources do > >>>>>>>>>> not enable filter pushdown. > >>>>>>>>>> > >>>>>>>>>> 3. "table.optimizer.source.predicate" = "false" > >>>>>>>>>> Predicate pushdown is not allowed for the planner. > >>>>>>>>>> Regardless of the value of the "scan.filter-push-down.enabled" > >>>>>>>>>> configuration, filter pushdown is disabled. > >>>>>>>>>> In this scenario, the behavior remains consistent with the old > >>>>>> version as > >>>>>>>>>> well. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> From an implementation perspective, setting the priority of > >>>>>>>>>> "scan.filter-push-down.enabled" higher than > >>>>>>>>>> "table.optimizer.source.predicate" is difficult to achieve now. > >>>>>>>>>> Because the PushFilterIntoSourceScanRuleBase at the planner > level > >>>>>> takes > >>>>>>>>>> precedence over the source-level FilterPushDownSpec. > >>>>>>>>>> Only when the PushFilterIntoSourceScanRuleBase is enabled, will > >>>> the > >>>>>>>>>> Source-level filter pushdown be performed. > >>>>>>>>>> > >>>>>>>>>> Additionally, in my opinion, there doesn't seem to be much > >>>> benefit in > >>>>>>>>>> setting a higher priority for "scan.filter-push-down.enabled". > >>>>>>>>>> It may instead affect compatibility and increase implementation > >>>>>> complexity. > >>>>>>>>>> > >>>>>>>>>> WDYT? > >>>>>>>>>> > >>>>>>>>>> Best, > >>>>>>>>>> Jiabao > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> 2023年10月25日 11:56,Benchao Li <libenc...@apache.org> 写道: > >>>>>>>>>>> > >>>>>>>>>>> I agree with Jane that fine-grained configurations should have > >>>> higher > >>>>>>>>>>> priority than job level configurations. > >>>>>>>>>>> > >>>>>>>>>>> For current proposal, we can achieve that: > >>>>>>>>>>> - Set "table.optimizer.source.predicate" = "true" to enable by > >>>>>>>>>>> default, and set ""scan.filter-push-down.enabled" = "false" to > >>>>>> disable > >>>>>>>>>>> it per table source > >>>>>>>>>>> - Set "table.optimizer.source.predicate" = "false" to disable > by > >>>>>>>>>>> default, and set ""scan.filter-push-down.enabled" = "true" to > >>>> enable > >>>>>>>>>>> it per table source > >>>>>>>>>>> > >>>>>>>>>>> Jane Chan <qingyue....@gmail.com> 于2023年10月24日周二 23:55写道: > >>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> I believe that the configuration > >>>> "table.optimizer.source.predicate" > >>>>>>>>>> has a > >>>>>>>>>>>>> higher priority at the planner level than the configuration > at > >>>> the > >>>>>>>>>> source > >>>>>>>>>>>>> level, > >>>>>>>>>>>>> and it seems easy to implement now. > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Correct me if I'm wrong, but I think the fine-grained > >>>> configuration > >>>>>>>>>>>> "scan.filter-push-down.enabled" should have a higher priority > >>>>>> because > >>>>>>>>>> the > >>>>>>>>>>>> default value of "table.optimizer.source.predicate" is true. > As > >>>> a > >>>>>>>>>> result, > >>>>>>>>>>>> turning off filter push-down for a specific source will not > take > >>>>>> effect > >>>>>>>>>>>> unless the default value of "table.optimizer.source.predicate" > >>>> is > >>>>>>>>>> changed > >>>>>>>>>>>> to false, or, alternatively, let users manually set > >>>>>>>>>>>> "table.optimizer.source.predicate" to false first and then > >>>>>> selectively > >>>>>>>>>>>> enable filter push-down for the desired sources, which is less > >>>>>>>>>> intuitive. > >>>>>>>>>>>> WDYT? > >>>>>>>>>>>> > >>>>>>>>>>>> Best, > >>>>>>>>>>>> Jane > >>>>>>>>>>>> > >>>>>>>>>>>> On Tue, Oct 24, 2023 at 6:05 PM Jiabao Sun < > >>>> jiabao....@xtransfer.cn > >>>>>>>>>> .invalid> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> Thanks Jane, > >>>>>>>>>>>>> > >>>>>>>>>>>>> I believe that the configuration > >>>> "table.optimizer.source.predicate" > >>>>>>>>>> has a > >>>>>>>>>>>>> higher priority at the planner level than the configuration > at > >>>> the > >>>>>>>>>> source > >>>>>>>>>>>>> level, > >>>>>>>>>>>>> and it seems easy to implement now. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Best, > >>>>>>>>>>>>> Jiabao > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>>> 2023年10月24日 17:36,Jane Chan <qingyue....@gmail.com> 写道: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Hi Jiabao, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks for driving this discussion. I have a small question > >>>> that > >>>>>> will > >>>>>>>>>>>>>> "scan.filter-push-down.enabled" take precedence over > >>>>>>>>>>>>>> "table.optimizer.source.predicate" when the two parameters > >>>> might > >>>>>>>>>> conflict > >>>>>>>>>>>>>> each other? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>> Jane > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Tue, Oct 24, 2023 at 5:05 PM Jiabao Sun < > >>>>>> jiabao....@xtransfer.cn > >>>>>>>>>>>>> .invalid> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thanks Jark, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> If we only add configuration without adding the > >>>>>> enableFilterPushDown > >>>>>>>>>>>>>>> method in the SupportsFilterPushDown interface, > >>>>>>>>>>>>>>> each connector would have to handle the same logic in the > >>>>>>>>>> applyFilters > >>>>>>>>>>>>>>> method to determine whether filter pushdown is needed. > >>>>>>>>>>>>>>> This would increase complexity and violate the original > >>>> behavior > >>>>>> of > >>>>>>>>>> the > >>>>>>>>>>>>>>> applyFilters method. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On the contrary, we only need to pass the configuration > >>>>>> parameter in > >>>>>>>>>> the > >>>>>>>>>>>>>>> newly added enableFilterPushDown method > >>>>>>>>>>>>>>> to decide whether to perform predicate pushdown. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I think this approach would be clearer and simpler. > >>>>>>>>>>>>>>> WDYT? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>> Jiabao > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> 2023年10月24日 16:58,Jark Wu <imj...@gmail.com> 写道: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Hi JIabao, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I think the current interface can already satisfy your > >>>>>> requirements. > >>>>>>>>>>>>>>>> The connector can reject all the filters by returning the > >>>> input > >>>>>>>>>> filters > >>>>>>>>>>>>>>>> as `Result#remainingFilters`. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> So maybe we don't need to introduce a new method to > disable > >>>>>>>>>>>>>>>> pushdown, but just introduce an option for the specific > >>>>>> connector. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>> Jark > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Tue, 24 Oct 2023 at 16:38, Leonard Xu < > xbjt...@gmail.com > >>>>> > >>>>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Thanks @Jiabao for kicking off this discussion. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Could you add a section to explain the difference between > >>>>>> proposed > >>>>>>>>>>>>>>>>> connector level config `scan.filter-push-down.enabled` > and > >>>>>> existing > >>>>>>>>>>>>>>> query > >>>>>>>>>>>>>>>>> level config > >>>>>> `table.optimizer.source.predicate-pushdown-enabled` ? > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>> Leonard > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> 2023年10月24日 下午4:18,Jiabao Sun <jiabao....@xtransfer.cn > >>>>>> .INVALID> > >>>>>>>>>> 写道: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Hi Devs, > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> I would like to start a discussion on FLIP-377: support > >>>>>>>>>> configuration > >>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>> disable filter pushdown for Table/SQL Sources[1]. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Currently, Flink Table/SQL does not expose fine-grained > >>>>>> control > >>>>>>>>>> for > >>>>>>>>>>>>>>>>> users to enable or disable filter pushdown. > >>>>>>>>>>>>>>>>>> However, filter pushdown has some side effects, such as > >>>>>> additional > >>>>>>>>>>>>>>>>> computational pressure on external systems. > >>>>>>>>>>>>>>>>>> Moreover, Improper queries can lead to issues such as > full > >>>>>> table > >>>>>>>>>>>>> scans, > >>>>>>>>>>>>>>>>> which in turn can impact the stability of external > systems. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Suppose we have an SQL query with two sources: Kafka > and a > >>>>>>>>>> database. > >>>>>>>>>>>>>>>>>> The database is sensitive to pressure, and we want to > >>>>>> configure > >>>>>>>>>> it to > >>>>>>>>>>>>>>>>> not perform filter pushdown to the database source. > >>>>>>>>>>>>>>>>>> However, we still want to perform filter pushdown to the > >>>> Kafka > >>>>>>>>>> source > >>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>> decrease network IO. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> I propose to support configuration to disable filter > push > >>>>>> down for > >>>>>>>>>>>>>>>>> Table/SQL sources to let user decide whether to perform > >>>> filter > >>>>>>>>>>>>> pushdown. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Looking forward to your feedback. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> [1] > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>> > >>>>>> > >>>> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768 > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>> Jiabao > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> -- > >>>>>>>>>>> > >>>>>>>>>>> Best, > >>>>>>>>>>> Benchao Li > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> > >>>>>>> Best, > >>>>>>> Benchao Li > >>>>>> > >>>>>> > >>>> > >>>> > >