[jira] [Comment Edited] (FLINK-5859) support partition pruning on Table API & SQL

godfrey he (JIRA) Sun, 26 Feb 2017 20:09:06 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885098#comment-15885098
 ]


godfrey he edited comment on FLINK-5859 at 2/27/17 4:08 AM:
------------------------------------------------------------

Hi, [~fhueske], Thanks for you advice. 

IMO, Rules including {{PushProjectIntoBatchTableSourceScanRule}}, 
{{PushFilterIntoBatchTableSourceScanRule}}, {{PartitionPruningRule}} (maybe, we 
integrate it in PushFilterIntoBatchTableSourceScanRule) and so on are need be 
applied only once and do not need cost model actually. And Rules including 
{{FilterCalcMergeRule}}, {{FilterJoinRule}}, {{DataSetCalcRule}} and so on 
do not need real cost, dummy cost is enough. Rules including 
{{LoptOptimizeJoinRule}}, {{JoinToMultiJoinRule}} and so on are applied with  
real cost. So we want to break the optimization phase down into 3 phases later. 
The whole optimization include 5 steps: 

# decorrelates a query
# normalize the logical plan with HEP planner
# optimize the logical plan with Volcano planner and dummy cost(including 
{{FilterCalcMergeRule}}, {{FilterJoinRule}}, {{DataSetCalcRule}} and so on)
# optimize the physical plan with HEP planner (including 
{{PushProjectIntoBatchTableSourceScanRule}}, 
{{PushFilterIntoBatchTableSourceScanRule}} and so on)
# optimize the physical plan with Volcano planner and real cost (including 
{{LoptOptimizeJoinRule}}, {{JoinToMultiJoinRule}} and so on)

At that time, each optimization phase  keeps the complexity as small as 
possible. And your concern can be eliminated also. 

Looking forward to your advice, thanks.


was (Author: godfreyhe):
Hi, [~fhueske], Thanks for you advice. 

IMO, Rules including `PushProjectIntoBatchTableSourceScanRule`, 
`PushFilterIntoBatchTableSourceScanRule`, `PartitionPruningRule`(maybe, we 
integrate it in PushFilterIntoBatchTableSourceScanRule) and so on are need be 
applied only once and do not need cost model actually. And Rules including 
`FilterCalcMergeRule`, `FilterJoinRule`, `DataSetCalcRule` and so on 
do not need real cost, dummy cost is enough. Rules including 
`LoptOptimizeJoinRule`, `JoinToMultiJoinRule` and so on are applied with  real 
cost. So we want to break the optimization phase down into 3 phases later. The 
whole optimization include 5 steps: 

# decorrelates a query
# normalize the logical plan with HEP planner
# optimize the logical plan with Volcano planner and dummy cost(including 
`FilterCalcMergeRule`, `FilterJoinRule`, `DataSetCalcRule` and so on)
# optimize the physical plan with HEP planner (including 
`PushProjectIntoBatchTableSourceScanRule`, 
`PushFilterIntoBatchTableSourceScanRule` and so on)
# optimize the physical plan with Volcano planner and real cost (including 
`LoptOptimizeJoinRule`, `JoinToMultiJoinRule` and so on)

At that time, each optimization phase  keeps the complexity as small as 
possible. And your concern can be eliminated also. 

Looking forward to your advice, thanks.

> support partition pruning on Table API & SQL
> --------------------------------------------
>
>                 Key: FLINK-5859
>                 URL: https://issues.apache.org/jira/browse/FLINK-5859
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API & SQL
>            Reporter: godfrey he
>            Assignee: godfrey he
>
> Many data sources are partitionable storage, e.g. HDFS, Druid. And many 
> queries just need to read a small subset of the total data. We can use 
> partition information to prune or skip over files irrelevant to the user’s 
> queries. Both query optimization time and execution time can be reduced 
> obviously, especially for a large partitioned table.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (FLINK-5859) support partition pruning on Table API & SQL

Reply via email to