[ https://issues.apache.org/jira/browse/FLINK-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885098#comment-15885098 ]
godfrey he edited comment on FLINK-5859 at 2/27/17 4:08 AM: ------------------------------------------------------------ Hi, [~fhueske], Thanks for you advice. IMO, Rules including {{PushProjectIntoBatchTableSourceScanRule}}, {{PushFilterIntoBatchTableSourceScanRule}}, {{PartitionPruningRule}} (maybe, we integrate it in PushFilterIntoBatchTableSourceScanRule) and so on are need be applied only once and do not need cost model actually. And Rules including {{FilterCalcMergeRule}}, {{FilterJoinRule}}, {{DataSetCalcRule}} and so on do not need real cost, dummy cost is enough. Rules including {{LoptOptimizeJoinRule}}, {{JoinToMultiJoinRule}} and so on are applied with real cost. So we want to break the optimization phase down into 3 phases later. The whole optimization include 5 steps: # decorrelates a query # normalize the logical plan with HEP planner # optimize the logical plan with Volcano planner and dummy cost(including {{FilterCalcMergeRule}}, {{FilterJoinRule}}, {{DataSetCalcRule}} and so on) # optimize the physical plan with HEP planner (including {{PushProjectIntoBatchTableSourceScanRule}}, {{PushFilterIntoBatchTableSourceScanRule}} and so on) # optimize the physical plan with Volcano planner and real cost (including {{LoptOptimizeJoinRule}}, {{JoinToMultiJoinRule}} and so on) At that time, each optimization phase keeps the complexity as small as possible. And your concern can be eliminated also. Looking forward to your advice, thanks. was (Author: godfreyhe): Hi, [~fhueske], Thanks for you advice. IMO, Rules including `PushProjectIntoBatchTableSourceScanRule`, `PushFilterIntoBatchTableSourceScanRule`, `PartitionPruningRule`(maybe, we integrate it in PushFilterIntoBatchTableSourceScanRule) and so on are need be applied only once and do not need cost model actually. And Rules including `FilterCalcMergeRule`, `FilterJoinRule`, `DataSetCalcRule` and so on do not need real cost, dummy cost is enough. Rules including `LoptOptimizeJoinRule`, `JoinToMultiJoinRule` and so on are applied with real cost. So we want to break the optimization phase down into 3 phases later. The whole optimization include 5 steps: # decorrelates a query # normalize the logical plan with HEP planner # optimize the logical plan with Volcano planner and dummy cost(including `FilterCalcMergeRule`, `FilterJoinRule`, `DataSetCalcRule` and so on) # optimize the physical plan with HEP planner (including `PushProjectIntoBatchTableSourceScanRule`, `PushFilterIntoBatchTableSourceScanRule` and so on) # optimize the physical plan with Volcano planner and real cost (including `LoptOptimizeJoinRule`, `JoinToMultiJoinRule` and so on) At that time, each optimization phase keeps the complexity as small as possible. And your concern can be eliminated also. Looking forward to your advice, thanks. > support partition pruning on Table API & SQL > -------------------------------------------- > > Key: FLINK-5859 > URL: https://issues.apache.org/jira/browse/FLINK-5859 > Project: Flink > Issue Type: New Feature > Components: Table API & SQL > Reporter: godfrey he > Assignee: godfrey he > > Many data sources are partitionable storage, e.g. HDFS, Druid. And many > queries just need to read a small subset of the total data. We can use > partition information to prune or skip over files irrelevant to the user’s > queries. Both query optimization time and execution time can be reduced > obviously, especially for a large partitioned table. -- This message was sent by Atlassian JIRA (v6.3.15#6346)