Yin Huai created HIVE-4809: ------------------------------ Summary: ReduceSinkOperator of PTFOperator can have redundant key columns Key: HIVE-4809 URL: https://issues.apache.org/jira/browse/HIVE-4809 Project: Hive Issue Type: Improvement Reporter: Yin Huai
For example, we have a simple query like this ... {code:sql} SELECT x.a, x.b, count(x.b) OVER (PARTITION BY x.a) FROM src x; {\code} The plan of it is ... {code} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: x TableScan alias: x Reduce Output Operator key expressions: expr: a type: int expr: a type: int sort order: ++ Map-reduce partition columns: expr: a type: int tag: -1 value expressions: expr: a type: int expr: b type: string Reduce Operator Tree: Extract PTF Operator Select Operator expressions: expr: _col0 type: int expr: _col1 type: string expr: _wcol0 type: bigint outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-0 Fetch Operator limit: -1 {\code} The ReduceSinkOperator has two "a" in its key columns. This redundancy can increase the size of map output. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira