Hi, all, Thanks for the lively discussion.
I agree with Jiabao. I think enabling "scan.filter-push-down.enabled" relies on enabling "table.optimizer.source.predicate-pushdown-enabled". It is a little strange that the planner still needs to push down the filters when we set "scan.filter-push-down.enabled=false" and "table.optimizer.source.predicate-pushdown-enabled=true". Maybe we need to add some checks to warn the users when setting "scan.filter-push-down.enabled=true" and "table.optimizer.source.predicate-pushdown-enabled=false". Besides that, I am +1 for renaming 'scan.filter-push-down.enabled' to 'source.predicate-pushdown.enabled'. Best, Hang Jiabao Sun <jiabao....@xtransfer.cn.invalid> 于2023年10月25日周三 18:23写道: > Thanks Benchao for the feedback. > > I understand that the configuration of global parallelism and task > parallelism is at different granularities but with the same configuration. > However, "table.optimizer.source.predicate-pushdown-enabled" and > "scan.filter-push-down.enabled" are configurations for different > components(optimizer and source operator). > > From a user's perspective, there are two scenarios: > > 1. Disabling all filter pushdown > In this case, setting "table.optimizer.source.predicate-pushdown-enabled" > to false is sufficient to meet the requirement. > > 2. Disabling filter pushdown for specific sources > In this scenario, there is no need to adjust the value of > "table.optimizer.source.predicate-pushdown-enabled". > Instead, the focus should be on the configuration of > "scan.filter-push-down.enabled" to meet the requirement. > In this case, users do not need to set > "table.optimizer.source.predicate-pushdown-enabled" to false and manually > enable filter pushdown for specific sources. > > Additionally, if "scan.filter-push-down.enabled" doesn't respect to > "table.optimizer.source.predicate-pushdown-enabled" and the default value > of "scan.filter-push-down.enabled" is defined as true, > it means that just modifying > "table.optimizer.source.predicate-pushdown-enabled" as false will have no > effect, and filter pushdown will still be performed. > > If we define the default value of "scan.filter-push-down.enabled" as > false, it would introduce a difference in behavior compared to the previous > version. > The same SQL query that could successfully push down filters in the old > version but would no longer do so after the upgrade. > > Best, > Jiabao > > > > 2023年10月25日 17:10,Benchao Li <libenc...@apache.org> 写道: > > > > Thanks Jiabao for the detailed explanations, that helps a lot, I > > understand your rationale now. > > > > Correct me if I'm wrong. Your perspective is from "developer", which > > means there is an optimizer and connector component, and if we want to > > enable this feature (pushing filters down into connectors), you must > > enable it firstly in optimizer, and only then connector has the chance > > to decide to use it or not. > > > > My perspective is from "user" that (Why a user should care about the > > difference of optimizer/connector) , this is a feature, and has two > > way to control it, one way is to config it job-level, the other one is > > in table properties. What a user expects is that they can control a > > feature in a tiered way, that setting it per job, and then > > fine-grained tune it per table. > > > > This is some kind of similar to other concepts, such as parallelism, > > users can set a job level default parallelism, and then fine-grained > > tune it per operator. There may be more such debate in the future > > e.g., we can have a job level config about adding key-by before lookup > > join, and also a hint/table property way to fine-grained control it > > per lookup operator. Hence we'd better find a unified way for all > > those similar kind of features. > > > > Jiabao Sun <jiabao....@xtransfer.cn.invalid> 于2023年10月25日周三 15:27写道: > >> > >> Thanks Jane for further explanation. > >> > >> These two configurations correspond to different levels. > "scan.filter-push-down.enabled" does not make > "table.optimizer.source.predicate" invalid. > >> The planner will still push down predicates to all sources. > >> Whether filter pushdown is allowed or not is determined by the specific > source's "scan.filter-push-down.enabled" configuration. > >> > >> However, "table.optimizer.source.predicate" does directly affect > "scan.filter-push-down.enabled”. > >> When the planner disables predicate pushdown, the source-level filter > pushdown will also not be executed, even if the source allows filter > pushdown. > >> > >> Whatever, in point 1 and 2, our expectation is consistent. > >> For the 3rd point, I still think that the planner-level configuration > takes precedence over the source-level configuration. > >> It may seem counterintuitive when we globally disable predicate > pushdown but allow filter pushdown at the source level. > >> > >> Best, > >> Jiabao > >> > >> > >> > >>> 2023年10月25日 14:35,Jane Chan <qingyue....@gmail.com> 写道: > >>> > >>> Hi Jiabao, > >>> > >>> Thanks for clarifying this. While by "scan.filter-push-down.enabled > takes a > >>> higher priority" I meant that this value should be respected whenever > it is > >>> set explicitly. > >>> > >>> The conclusion that > >>> > >>> 2. "table.optimizer.source.predicate" = "true" and > >>>> "scan.filter-push-down.enabled" = "false" > >>>> Allow the planner to perform predicate pushdown, but individual > sources do > >>>> not enable filter pushdown. > >>>> > >>> > >>> This indicates that the option "scan.filter-push-down.enabled = false" > for > >>> an individual source connector does indeed override the global-level > >>> planner settings to make a difference. And thus "has a higher > priority". > >>> > >>> While for > >>> > >>> 3. "table.optimizer.source.predicate" = "false" > >>>> Predicate pushdown is not allowed for the planner. > >>>> Regardless of the value of the "scan.filter-push-down.enabled" > >>>> configuration, filter pushdown is disabled. > >>>> In this scenario, the behavior remains consistent with the old > version as > >>>> well. > >>>> > >>> > >>> I still think "scan.filter-push-down.enabled" should also be respected > if > >>> it is enabled for individual connectors. WDYT? > >>> > >>> Best, > >>> Jane > >>> > >>> On Wed, Oct 25, 2023 at 1:27 PM Jiabao Sun <jiabao....@xtransfer.cn > .invalid> > >>> wrote: > >>> > >>>> Thanks Benchao for the feedback. > >>>> > >>>> For the current proposal, we recommend keeping the default value of > >>>> "table.optimizer.source.predicate" as true, > >>>> and setting the the default value of newly introduced option > >>>> "scan.filter-push-down.enabled" to true as well. > >>>> > >>>> The main purpose of doing this is to maintain consistency with > previous > >>>> versions, as whether to perform > >>>> filter pushdown in the old version solely depends on the > >>>> "table.optimizer.source.predicate" option. > >>>> That means by default, as long as a TableSource implements the > >>>> SupportsFilterPushDown interface, filter pushdown is allowed. > >>>> And it seems that we don't have much benefit in changing the default > value > >>>> of "table.optimizer.source.predicate" to false. > >>>> > >>>> Regarding the priority of these two configurations, I believe that > >>>> "table.optimizer.source.predicate" > >>>> takes precedence over "scan.filter-push-down.enabled" and it exhibits > the > >>>> following behavior. > >>>> > >>>> 1. "table.optimizer.source.predicate" = "true" and > >>>> "scan.filter-push-down.enabled" = "true" > >>>> This is the default behavior, allowing filter pushdown for sources. > >>>> > >>>> 2. "table.optimizer.source.predicate" = "true" and > >>>> "scan.filter-push-down.enabled" = "false" > >>>> Allow the planner to perform predicate pushdown, but individual > sources do > >>>> not enable filter pushdown. > >>>> > >>>> 3. "table.optimizer.source.predicate" = "false" > >>>> Predicate pushdown is not allowed for the planner. > >>>> Regardless of the value of the "scan.filter-push-down.enabled" > >>>> configuration, filter pushdown is disabled. > >>>> In this scenario, the behavior remains consistent with the old > version as > >>>> well. > >>>> > >>>> > >>>> From an implementation perspective, setting the priority of > >>>> "scan.filter-push-down.enabled" higher than > >>>> "table.optimizer.source.predicate" is difficult to achieve now. > >>>> Because the PushFilterIntoSourceScanRuleBase at the planner level > takes > >>>> precedence over the source-level FilterPushDownSpec. > >>>> Only when the PushFilterIntoSourceScanRuleBase is enabled, will the > >>>> Source-level filter pushdown be performed. > >>>> > >>>> Additionally, in my opinion, there doesn't seem to be much benefit in > >>>> setting a higher priority for "scan.filter-push-down.enabled". > >>>> It may instead affect compatibility and increase implementation > complexity. > >>>> > >>>> WDYT? > >>>> > >>>> Best, > >>>> Jiabao > >>>> > >>>> > >>>>> 2023年10月25日 11:56,Benchao Li <libenc...@apache.org> 写道: > >>>>> > >>>>> I agree with Jane that fine-grained configurations should have higher > >>>>> priority than job level configurations. > >>>>> > >>>>> For current proposal, we can achieve that: > >>>>> - Set "table.optimizer.source.predicate" = "true" to enable by > >>>>> default, and set ""scan.filter-push-down.enabled" = "false" to > disable > >>>>> it per table source > >>>>> - Set "table.optimizer.source.predicate" = "false" to disable by > >>>>> default, and set ""scan.filter-push-down.enabled" = "true" to enable > >>>>> it per table source > >>>>> > >>>>> Jane Chan <qingyue....@gmail.com> 于2023年10月24日周二 23:55写道: > >>>>>> > >>>>>>> > >>>>>>> I believe that the configuration "table.optimizer.source.predicate" > >>>> has a > >>>>>>> higher priority at the planner level than the configuration at the > >>>> source > >>>>>>> level, > >>>>>>> and it seems easy to implement now. > >>>>>>> > >>>>>> > >>>>>> Correct me if I'm wrong, but I think the fine-grained configuration > >>>>>> "scan.filter-push-down.enabled" should have a higher priority > because > >>>> the > >>>>>> default value of "table.optimizer.source.predicate" is true. As a > >>>> result, > >>>>>> turning off filter push-down for a specific source will not take > effect > >>>>>> unless the default value of "table.optimizer.source.predicate" is > >>>> changed > >>>>>> to false, or, alternatively, let users manually set > >>>>>> "table.optimizer.source.predicate" to false first and then > selectively > >>>>>> enable filter push-down for the desired sources, which is less > >>>> intuitive. > >>>>>> WDYT? > >>>>>> > >>>>>> Best, > >>>>>> Jane > >>>>>> > >>>>>> On Tue, Oct 24, 2023 at 6:05 PM Jiabao Sun <jiabao....@xtransfer.cn > >>>> .invalid> > >>>>>> wrote: > >>>>>> > >>>>>>> Thanks Jane, > >>>>>>> > >>>>>>> I believe that the configuration "table.optimizer.source.predicate" > >>>> has a > >>>>>>> higher priority at the planner level than the configuration at the > >>>> source > >>>>>>> level, > >>>>>>> and it seems easy to implement now. > >>>>>>> > >>>>>>> Best, > >>>>>>> Jiabao > >>>>>>> > >>>>>>> > >>>>>>>> 2023年10月24日 17:36,Jane Chan <qingyue....@gmail.com> 写道: > >>>>>>>> > >>>>>>>> Hi Jiabao, > >>>>>>>> > >>>>>>>> Thanks for driving this discussion. I have a small question that > will > >>>>>>>> "scan.filter-push-down.enabled" take precedence over > >>>>>>>> "table.optimizer.source.predicate" when the two parameters might > >>>> conflict > >>>>>>>> each other? > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Jane > >>>>>>>> > >>>>>>>> On Tue, Oct 24, 2023 at 5:05 PM Jiabao Sun < > jiabao....@xtransfer.cn > >>>>>>> .invalid> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> Thanks Jark, > >>>>>>>>> > >>>>>>>>> If we only add configuration without adding the > enableFilterPushDown > >>>>>>>>> method in the SupportsFilterPushDown interface, > >>>>>>>>> each connector would have to handle the same logic in the > >>>> applyFilters > >>>>>>>>> method to determine whether filter pushdown is needed. > >>>>>>>>> This would increase complexity and violate the original behavior > of > >>>> the > >>>>>>>>> applyFilters method. > >>>>>>>>> > >>>>>>>>> On the contrary, we only need to pass the configuration > parameter in > >>>> the > >>>>>>>>> newly added enableFilterPushDown method > >>>>>>>>> to decide whether to perform predicate pushdown. > >>>>>>>>> > >>>>>>>>> I think this approach would be clearer and simpler. > >>>>>>>>> WDYT? > >>>>>>>>> > >>>>>>>>> Best, > >>>>>>>>> Jiabao > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> 2023年10月24日 16:58,Jark Wu <imj...@gmail.com> 写道: > >>>>>>>>>> > >>>>>>>>>> Hi JIabao, > >>>>>>>>>> > >>>>>>>>>> I think the current interface can already satisfy your > requirements. > >>>>>>>>>> The connector can reject all the filters by returning the input > >>>> filters > >>>>>>>>>> as `Result#remainingFilters`. > >>>>>>>>>> > >>>>>>>>>> So maybe we don't need to introduce a new method to disable > >>>>>>>>>> pushdown, but just introduce an option for the specific > connector. > >>>>>>>>>> > >>>>>>>>>> Best, > >>>>>>>>>> Jark > >>>>>>>>>> > >>>>>>>>>> On Tue, 24 Oct 2023 at 16:38, Leonard Xu <xbjt...@gmail.com> > wrote: > >>>>>>>>>> > >>>>>>>>>>> Thanks @Jiabao for kicking off this discussion. > >>>>>>>>>>> > >>>>>>>>>>> Could you add a section to explain the difference between > proposed > >>>>>>>>>>> connector level config `scan.filter-push-down.enabled` and > existing > >>>>>>>>> query > >>>>>>>>>>> level config > `table.optimizer.source.predicate-pushdown-enabled` ? > >>>>>>>>>>> > >>>>>>>>>>> Best, > >>>>>>>>>>> Leonard > >>>>>>>>>>> > >>>>>>>>>>>> 2023年10月24日 下午4:18,Jiabao Sun <jiabao....@xtransfer.cn > .INVALID> > >>>> 写道: > >>>>>>>>>>>> > >>>>>>>>>>>> Hi Devs, > >>>>>>>>>>>> > >>>>>>>>>>>> I would like to start a discussion on FLIP-377: support > >>>> configuration > >>>>>>>>> to > >>>>>>>>>>> disable filter pushdown for Table/SQL Sources[1]. > >>>>>>>>>>>> > >>>>>>>>>>>> Currently, Flink Table/SQL does not expose fine-grained > control > >>>> for > >>>>>>>>>>> users to enable or disable filter pushdown. > >>>>>>>>>>>> However, filter pushdown has some side effects, such as > additional > >>>>>>>>>>> computational pressure on external systems. > >>>>>>>>>>>> Moreover, Improper queries can lead to issues such as full > table > >>>>>>> scans, > >>>>>>>>>>> which in turn can impact the stability of external systems. > >>>>>>>>>>>> > >>>>>>>>>>>> Suppose we have an SQL query with two sources: Kafka and a > >>>> database. > >>>>>>>>>>>> The database is sensitive to pressure, and we want to > configure > >>>> it to > >>>>>>>>>>> not perform filter pushdown to the database source. > >>>>>>>>>>>> However, we still want to perform filter pushdown to the Kafka > >>>> source > >>>>>>>>> to > >>>>>>>>>>> decrease network IO. > >>>>>>>>>>>> > >>>>>>>>>>>> I propose to support configuration to disable filter push > down for > >>>>>>>>>>> Table/SQL sources to let user decide whether to perform filter > >>>>>>> pushdown. > >>>>>>>>>>>> > >>>>>>>>>>>> Looking forward to your feedback. > >>>>>>>>>>>> > >>>>>>>>>>>> [1] > >>>>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768 > >>>>>>>>>>>> > >>>>>>>>>>>> Best, > >>>>>>>>>>>> Jiabao > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>>> > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> > >>>>> Best, > >>>>> Benchao Li > >>>> > >>>> > >> > > > > > > -- > > > > Best, > > Benchao Li > >