[ https://issues.apache.org/jira/browse/HIVE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yin Huai updated HIVE-4809: --------------------------- Assignee: Yin Huai > ReduceSinkOperator of PTFOperator can have redundant key columns > ---------------------------------------------------------------- > > Key: HIVE-4809 > URL: https://issues.apache.org/jira/browse/HIVE-4809 > Project: Hive > Issue Type: Improvement > Reporter: Yin Huai > Assignee: Yin Huai > > For example, we have a simple query like this ... > {code:sql} > SELECT x.a, x.b, count(x.b) OVER (PARTITION BY x.a) FROM src x; > {\code} > The plan of it is ... > {code} > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 is a root stage > STAGE PLANS: > Stage: Stage-1 > Map Reduce > Alias -> Map Operator Tree: > x > TableScan > alias: x > Reduce Output Operator > key expressions: > expr: a > type: int > expr: a > type: int > sort order: ++ > Map-reduce partition columns: > expr: a > type: int > tag: -1 > value expressions: > expr: a > type: int > expr: b > type: string > Reduce Operator Tree: > Extract > PTF Operator > Select Operator > expressions: > expr: _col0 > type: int > expr: _col1 > type: string > expr: _wcol0 > type: bigint > outputColumnNames: _col0, _col1, _col2 > File Output Operator > compressed: false > GlobalTableId: 0 > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Stage: Stage-0 > Fetch Operator > limit: -1 > {\code} > The ReduceSinkOperator has two "a" in its key columns. This redundancy can > increase the size of map output. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira