[ https://issues.apache.org/jira/browse/FLINK-5266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15735523#comment-15735523 ]
ASF GitHub Bot commented on FLINK-5266: --------------------------------------- Github user KurtYoung commented on a diff in the pull request: https://github.com/apache/flink/pull/2961#discussion_r91729382 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/table.scala --- @@ -881,24 +883,21 @@ class GroupWindowedTable( * }}} */ def select(fields: Expression*): Table = { --- End diff -- What if user use a customized watermark extracter which used some fields from the element. For example, we have original table source containing 4 fields: a, b, c, d. And user used "a" field to extract the timestamp and watermark. But in the later query on the table, only "b" and "c" are used. And if we do a projection on "b" and "c", and the projection is pushed into table source further, we will not get field "a" which used to produce timestamp anymore. > Eagerly project unused fields when selecting aggregation fields > --------------------------------------------------------------- > > Key: FLINK-5266 > URL: https://issues.apache.org/jira/browse/FLINK-5266 > Project: Flink > Issue Type: Improvement > Components: Table API & SQL > Reporter: Kurt Young > Assignee: Kurt Young > > When we call table's {{select}} method and if it contains some aggregations, > we will project fields after the aggregation. Would be better to project > unused fields before the aggregation, and can furthermore leave the > opportunity to push the project into scan. > For example, the current logical plan of a simple query: > {code} > table.select('a.sum as 's, 'a.max) > {code} > is > {code} > LogicalProject(s=[$0], TMP_2=[$1]) > LogicalAggregate(group=[{}], TMP_0=[SUM($5)], TMP_1=[MAX($5)]) > LogicalTableScan(table=[[supplier]]) > {code} > Would be better if we can project unused fields right after scan, and looks > like this: > {code} > LogicalProject(s=[$0], EXPR$1=[$0]) > LogicalAggregate(group=[{}], EXPR$1=[SUM($0)]) > LogicalProject(a=[$5]) > LogicalTableScan(table=[[supplier]]) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)