Thank you all for the lively discussion! Agree with Benchao that from a user's (rather than a developer's) point of view, it's easier to understand that fine-grained options override global options.
In addition, for the new option 'scan.filter-push-down.enabled', would it be better to keep the name consistent with the global option 'table.optimizer.source.predicate-pushdown-enabled' ? e.g. 'source.predicate-pushdown.enabled' (here, source contains LookupTableSource and ScanTableSource) Best, Lincoln Lee Benchao Li <libenc...@apache.org> 于2023年10月25日周三 17:11写道: > Thanks Jiabao for the detailed explanations, that helps a lot, I > understand your rationale now. > > Correct me if I'm wrong. Your perspective is from "developer", which > means there is an optimizer and connector component, and if we want to > enable this feature (pushing filters down into connectors), you must > enable it firstly in optimizer, and only then connector has the chance > to decide to use it or not. > > My perspective is from "user" that (Why a user should care about the > difference of optimizer/connector) , this is a feature, and has two > way to control it, one way is to config it job-level, the other one is > in table properties. What a user expects is that they can control a > feature in a tiered way, that setting it per job, and then > fine-grained tune it per table. > > This is some kind of similar to other concepts, such as parallelism, > users can set a job level default parallelism, and then fine-grained > tune it per operator. There may be more such debate in the future > e.g., we can have a job level config about adding key-by before lookup > join, and also a hint/table property way to fine-grained control it > per lookup operator. Hence we'd better find a unified way for all > those similar kind of features. > > Jiabao Sun <jiabao....@xtransfer.cn.invalid> 于2023年10月25日周三 15:27写道: > > > > Thanks Jane for further explanation. > > > > These two configurations correspond to different levels. > "scan.filter-push-down.enabled" does not make > "table.optimizer.source.predicate" invalid. > > The planner will still push down predicates to all sources. > > Whether filter pushdown is allowed or not is determined by the specific > source's "scan.filter-push-down.enabled" configuration. > > > > However, "table.optimizer.source.predicate" does directly affect > "scan.filter-push-down.enabled”. > > When the planner disables predicate pushdown, the source-level filter > pushdown will also not be executed, even if the source allows filter > pushdown. > > > > Whatever, in point 1 and 2, our expectation is consistent. > > For the 3rd point, I still think that the planner-level configuration > takes precedence over the source-level configuration. > > It may seem counterintuitive when we globally disable predicate pushdown > but allow filter pushdown at the source level. > > > > Best, > > Jiabao > > > > > > > > > 2023年10月25日 14:35,Jane Chan <qingyue....@gmail.com> 写道: > > > > > > Hi Jiabao, > > > > > > Thanks for clarifying this. While by "scan.filter-push-down.enabled > takes a > > > higher priority" I meant that this value should be respected whenever > it is > > > set explicitly. > > > > > > The conclusion that > > > > > > 2. "table.optimizer.source.predicate" = "true" and > > >> "scan.filter-push-down.enabled" = "false" > > >> Allow the planner to perform predicate pushdown, but individual > sources do > > >> not enable filter pushdown. > > >> > > > > > > This indicates that the option "scan.filter-push-down.enabled = false" > for > > > an individual source connector does indeed override the global-level > > > planner settings to make a difference. And thus "has a higher > priority". > > > > > > While for > > > > > > 3. "table.optimizer.source.predicate" = "false" > > >> Predicate pushdown is not allowed for the planner. > > >> Regardless of the value of the "scan.filter-push-down.enabled" > > >> configuration, filter pushdown is disabled. > > >> In this scenario, the behavior remains consistent with the old > version as > > >> well. > > >> > > > > > > I still think "scan.filter-push-down.enabled" should also be respected > if > > > it is enabled for individual connectors. WDYT? > > > > > > Best, > > > Jane > > > > > > On Wed, Oct 25, 2023 at 1:27 PM Jiabao Sun <jiabao....@xtransfer.cn > .invalid> > > > wrote: > > > > > >> Thanks Benchao for the feedback. > > >> > > >> For the current proposal, we recommend keeping the default value of > > >> "table.optimizer.source.predicate" as true, > > >> and setting the the default value of newly introduced option > > >> "scan.filter-push-down.enabled" to true as well. > > >> > > >> The main purpose of doing this is to maintain consistency with > previous > > >> versions, as whether to perform > > >> filter pushdown in the old version solely depends on the > > >> "table.optimizer.source.predicate" option. > > >> That means by default, as long as a TableSource implements the > > >> SupportsFilterPushDown interface, filter pushdown is allowed. > > >> And it seems that we don't have much benefit in changing the default > value > > >> of "table.optimizer.source.predicate" to false. > > >> > > >> Regarding the priority of these two configurations, I believe that > > >> "table.optimizer.source.predicate" > > >> takes precedence over "scan.filter-push-down.enabled" and it exhibits > the > > >> following behavior. > > >> > > >> 1. "table.optimizer.source.predicate" = "true" and > > >> "scan.filter-push-down.enabled" = "true" > > >> This is the default behavior, allowing filter pushdown for sources. > > >> > > >> 2. "table.optimizer.source.predicate" = "true" and > > >> "scan.filter-push-down.enabled" = "false" > > >> Allow the planner to perform predicate pushdown, but individual > sources do > > >> not enable filter pushdown. > > >> > > >> 3. "table.optimizer.source.predicate" = "false" > > >> Predicate pushdown is not allowed for the planner. > > >> Regardless of the value of the "scan.filter-push-down.enabled" > > >> configuration, filter pushdown is disabled. > > >> In this scenario, the behavior remains consistent with the old > version as > > >> well. > > >> > > >> > > >> From an implementation perspective, setting the priority of > > >> "scan.filter-push-down.enabled" higher than > > >> "table.optimizer.source.predicate" is difficult to achieve now. > > >> Because the PushFilterIntoSourceScanRuleBase at the planner level > takes > > >> precedence over the source-level FilterPushDownSpec. > > >> Only when the PushFilterIntoSourceScanRuleBase is enabled, will the > > >> Source-level filter pushdown be performed. > > >> > > >> Additionally, in my opinion, there doesn't seem to be much benefit in > > >> setting a higher priority for "scan.filter-push-down.enabled". > > >> It may instead affect compatibility and increase implementation > complexity. > > >> > > >> WDYT? > > >> > > >> Best, > > >> Jiabao > > >> > > >> > > >>> 2023年10月25日 11:56,Benchao Li <libenc...@apache.org> 写道: > > >>> > > >>> I agree with Jane that fine-grained configurations should have higher > > >>> priority than job level configurations. > > >>> > > >>> For current proposal, we can achieve that: > > >>> - Set "table.optimizer.source.predicate" = "true" to enable by > > >>> default, and set ""scan.filter-push-down.enabled" = "false" to > disable > > >>> it per table source > > >>> - Set "table.optimizer.source.predicate" = "false" to disable by > > >>> default, and set ""scan.filter-push-down.enabled" = "true" to enable > > >>> it per table source > > >>> > > >>> Jane Chan <qingyue....@gmail.com> 于2023年10月24日周二 23:55写道: > > >>>> > > >>>>> > > >>>>> I believe that the configuration "table.optimizer.source.predicate" > > >> has a > > >>>>> higher priority at the planner level than the configuration at the > > >> source > > >>>>> level, > > >>>>> and it seems easy to implement now. > > >>>>> > > >>>> > > >>>> Correct me if I'm wrong, but I think the fine-grained configuration > > >>>> "scan.filter-push-down.enabled" should have a higher priority > because > > >> the > > >>>> default value of "table.optimizer.source.predicate" is true. As a > > >> result, > > >>>> turning off filter push-down for a specific source will not take > effect > > >>>> unless the default value of "table.optimizer.source.predicate" is > > >> changed > > >>>> to false, or, alternatively, let users manually set > > >>>> "table.optimizer.source.predicate" to false first and then > selectively > > >>>> enable filter push-down for the desired sources, which is less > > >> intuitive. > > >>>> WDYT? > > >>>> > > >>>> Best, > > >>>> Jane > > >>>> > > >>>> On Tue, Oct 24, 2023 at 6:05 PM Jiabao Sun <jiabao....@xtransfer.cn > > >> .invalid> > > >>>> wrote: > > >>>> > > >>>>> Thanks Jane, > > >>>>> > > >>>>> I believe that the configuration "table.optimizer.source.predicate" > > >> has a > > >>>>> higher priority at the planner level than the configuration at the > > >> source > > >>>>> level, > > >>>>> and it seems easy to implement now. > > >>>>> > > >>>>> Best, > > >>>>> Jiabao > > >>>>> > > >>>>> > > >>>>>> 2023年10月24日 17:36,Jane Chan <qingyue....@gmail.com> 写道: > > >>>>>> > > >>>>>> Hi Jiabao, > > >>>>>> > > >>>>>> Thanks for driving this discussion. I have a small question that > will > > >>>>>> "scan.filter-push-down.enabled" take precedence over > > >>>>>> "table.optimizer.source.predicate" when the two parameters might > > >> conflict > > >>>>>> each other? > > >>>>>> > > >>>>>> Best, > > >>>>>> Jane > > >>>>>> > > >>>>>> On Tue, Oct 24, 2023 at 5:05 PM Jiabao Sun < > jiabao....@xtransfer.cn > > >>>>> .invalid> > > >>>>>> wrote: > > >>>>>> > > >>>>>>> Thanks Jark, > > >>>>>>> > > >>>>>>> If we only add configuration without adding the > enableFilterPushDown > > >>>>>>> method in the SupportsFilterPushDown interface, > > >>>>>>> each connector would have to handle the same logic in the > > >> applyFilters > > >>>>>>> method to determine whether filter pushdown is needed. > > >>>>>>> This would increase complexity and violate the original behavior > of > > >> the > > >>>>>>> applyFilters method. > > >>>>>>> > > >>>>>>> On the contrary, we only need to pass the configuration > parameter in > > >> the > > >>>>>>> newly added enableFilterPushDown method > > >>>>>>> to decide whether to perform predicate pushdown. > > >>>>>>> > > >>>>>>> I think this approach would be clearer and simpler. > > >>>>>>> WDYT? > > >>>>>>> > > >>>>>>> Best, > > >>>>>>> Jiabao > > >>>>>>> > > >>>>>>> > > >>>>>>>> 2023年10月24日 16:58,Jark Wu <imj...@gmail.com> 写道: > > >>>>>>>> > > >>>>>>>> Hi JIabao, > > >>>>>>>> > > >>>>>>>> I think the current interface can already satisfy your > requirements. > > >>>>>>>> The connector can reject all the filters by returning the input > > >> filters > > >>>>>>>> as `Result#remainingFilters`. > > >>>>>>>> > > >>>>>>>> So maybe we don't need to introduce a new method to disable > > >>>>>>>> pushdown, but just introduce an option for the specific > connector. > > >>>>>>>> > > >>>>>>>> Best, > > >>>>>>>> Jark > > >>>>>>>> > > >>>>>>>> On Tue, 24 Oct 2023 at 16:38, Leonard Xu <xbjt...@gmail.com> > wrote: > > >>>>>>>> > > >>>>>>>>> Thanks @Jiabao for kicking off this discussion. > > >>>>>>>>> > > >>>>>>>>> Could you add a section to explain the difference between > proposed > > >>>>>>>>> connector level config `scan.filter-push-down.enabled` and > existing > > >>>>>>> query > > >>>>>>>>> level config > `table.optimizer.source.predicate-pushdown-enabled` ? > > >>>>>>>>> > > >>>>>>>>> Best, > > >>>>>>>>> Leonard > > >>>>>>>>> > > >>>>>>>>>> 2023年10月24日 下午4:18,Jiabao Sun <jiabao....@xtransfer.cn > .INVALID> > > >> 写道: > > >>>>>>>>>> > > >>>>>>>>>> Hi Devs, > > >>>>>>>>>> > > >>>>>>>>>> I would like to start a discussion on FLIP-377: support > > >> configuration > > >>>>>>> to > > >>>>>>>>> disable filter pushdown for Table/SQL Sources[1]. > > >>>>>>>>>> > > >>>>>>>>>> Currently, Flink Table/SQL does not expose fine-grained > control > > >> for > > >>>>>>>>> users to enable or disable filter pushdown. > > >>>>>>>>>> However, filter pushdown has some side effects, such as > additional > > >>>>>>>>> computational pressure on external systems. > > >>>>>>>>>> Moreover, Improper queries can lead to issues such as full > table > > >>>>> scans, > > >>>>>>>>> which in turn can impact the stability of external systems. > > >>>>>>>>>> > > >>>>>>>>>> Suppose we have an SQL query with two sources: Kafka and a > > >> database. > > >>>>>>>>>> The database is sensitive to pressure, and we want to > configure > > >> it to > > >>>>>>>>> not perform filter pushdown to the database source. > > >>>>>>>>>> However, we still want to perform filter pushdown to the Kafka > > >> source > > >>>>>>> to > > >>>>>>>>> decrease network IO. > > >>>>>>>>>> > > >>>>>>>>>> I propose to support configuration to disable filter push > down for > > >>>>>>>>> Table/SQL sources to let user decide whether to perform filter > > >>>>> pushdown. > > >>>>>>>>>> > > >>>>>>>>>> Looking forward to your feedback. > > >>>>>>>>>> > > >>>>>>>>>> [1] > > >>>>>>>>> > > >>>>>>> > > >>>>> > > >> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768 > > >>>>>>>>>> > > >>>>>>>>>> Best, > > >>>>>>>>>> Jiabao > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>> > > >>>>> > > >>> > > >>> > > >>> > > >>> -- > > >>> > > >>> Best, > > >>> Benchao Li > > >> > > >> > > > > > -- > > Best, > Benchao Li >