[jira] [Commented] (FLINK-5226) Eagerly project unused attributes

zhangjing (JIRA) Fri, 02 Dec 2016 04:11:13 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-5226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15714962#comment-15714962
 ]


zhangjing commented on FLINK-5226:
----------------------------------

Hi Fabian, FlinkRuleSets already contains ProjectJoinTransposeRule, this rule 
would push project into inputs of Join. And in the above case,  
ProjectJoinTransposeRule is applied in fact, but the path it effects is not 
chosen as cheapest one by VolcanoPlanner because of the cost mode. When we do 
projection pushdown optimization,  we change computeSelfCost of BatchScan to 
take column count into consideration, and find out the path would be choose as 
best one, as the PushProjectIntoBatchTableSourceScanITCase.testJoinOnScanSql in 
https://github.com/fhueske/flink/commit/a6a40e9b6dee4ab178f1e497c66dbc7e577b67e6.

> Eagerly project unused attributes
> ---------------------------------
>
>                 Key: FLINK-5226
>                 URL: https://issues.apache.org/jira/browse/FLINK-5226
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table API & SQL
>    Affects Versions: 1.2.0
>            Reporter: Fabian Hueske
>
> The optimizer does currently not eagerly remove unused attributes. 
> For example given a table {{tab5}} with five attributes {{a, b, c, d, e}}, 
> the following query
> {code}
> SELECT x.a, y.b FROM tab5 AS x, tab5 AS y WHERE x.a = y.a
> {code}
> would result in the non-optimized plan
> {code}
> LogicalProject(a=[$0], b=[$6])
>   LogicalFilter(condition=[=($0, $5)])
>     LogicalJoin(condition=[true], joinType=[inner])
>       LogicalTableScan(table=[[tab5]])
>       LogicalTableScan(table=[[tab5]])
> {code}
> and the optimized plan:
> {code}
> DataSetCalc(select=[a, b0 AS b])
>   DataSetJoin(where=[=(a, a0)], join=[a, b, c, d, e, a0, b0, c0, d0, e0], 
> joinType=[InnerJoin])
>     DataSetScan(table=[[_DataSetTable_0]])
>     DataSetScan(table=[[_DataSetTable_0]])
> {code}
> This plan is inefficient because it joins all ten attributes of both tables 
> instead of eagerly projecting out all unused fields ({{x.b, x.c, x.d, x.e, 
> y.c, y.d, y.e}}).
> Since this is one of the most common optimizations, I would assume that 
> Calcite provides some rules to extract eager projections. If this is the 
> case, the issue can be solved by adding such rules to {{FlinkRuleSets}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-5226) Eagerly project unused attributes

Reply via email to