[ https://issues.apache.org/jira/browse/FLINK-5226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15714962#comment-15714962 ]
zhangjing commented on FLINK-5226: ---------------------------------- Hi Fabian, FlinkRuleSets already contains ProjectJoinTransposeRule, this rule would push project into inputs of Join. And in the above case, ProjectJoinTransposeRule is applied in fact, but the path it effects is not chosen as cheapest one by VolcanoPlanner because of the cost mode. When we do projection pushdown optimization, we change computeSelfCost of BatchScan to take column count into consideration, and find out the path would be choose as best one, as the PushProjectIntoBatchTableSourceScanITCase.testJoinOnScanSql in https://github.com/fhueske/flink/commit/a6a40e9b6dee4ab178f1e497c66dbc7e577b67e6. > Eagerly project unused attributes > --------------------------------- > > Key: FLINK-5226 > URL: https://issues.apache.org/jira/browse/FLINK-5226 > Project: Flink > Issue Type: Improvement > Components: Table API & SQL > Affects Versions: 1.2.0 > Reporter: Fabian Hueske > > The optimizer does currently not eagerly remove unused attributes. > For example given a table {{tab5}} with five attributes {{a, b, c, d, e}}, > the following query > {code} > SELECT x.a, y.b FROM tab5 AS x, tab5 AS y WHERE x.a = y.a > {code} > would result in the non-optimized plan > {code} > LogicalProject(a=[$0], b=[$6]) > LogicalFilter(condition=[=($0, $5)]) > LogicalJoin(condition=[true], joinType=[inner]) > LogicalTableScan(table=[[tab5]]) > LogicalTableScan(table=[[tab5]]) > {code} > and the optimized plan: > {code} > DataSetCalc(select=[a, b0 AS b]) > DataSetJoin(where=[=(a, a0)], join=[a, b, c, d, e, a0, b0, c0, d0, e0], > joinType=[InnerJoin]) > DataSetScan(table=[[_DataSetTable_0]]) > DataSetScan(table=[[_DataSetTable_0]]) > {code} > This plan is inefficient because it joins all ten attributes of both tables > instead of eagerly projecting out all unused fields ({{x.b, x.c, x.d, x.e, > y.c, y.d, y.e}}). > Since this is one of the most common optimizations, I would assume that > Calcite provides some rules to extract eager projections. If this is the > case, the issue can be solved by adding such rules to {{FlinkRuleSets}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)