Yin Huai created HIVE-4809:
------------------------------

             Summary: ReduceSinkOperator of PTFOperator can have redundant key 
columns
                 Key: HIVE-4809
                 URL: https://issues.apache.org/jira/browse/HIVE-4809
             Project: Hive
          Issue Type: Improvement
            Reporter: Yin Huai


For example, we have a simple query like this ...
{code:sql}
SELECT x.a, x.b, count(x.b) OVER (PARTITION BY x.a) FROM src x;
{\code}

The plan of it is ...
{code}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        x 
          TableScan
            alias: x
            Reduce Output Operator
              key expressions:
                    expr: a
                    type: int
                    expr: a
                    type: int
              sort order: ++
              Map-reduce partition columns:
                    expr: a
                    type: int
              tag: -1
              value expressions:
                    expr: a
                    type: int
                    expr: b
                    type: string
      Reduce Operator Tree:
        Extract
          PTF Operator
            Select Operator
              expressions:
                    expr: _col0
                    type: int
                    expr: _col1
                    type: string
                    expr: _wcol0
                    type: bigint
              outputColumnNames: _col0, _col1, _col2
              File Output Operator
                compressed: false
                GlobalTableId: 0
                table:
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

  Stage: Stage-0
    Fetch Operator
      limit: -1
{\code}

The ReduceSinkOperator has two "a" in its key columns. This redundancy can 
increase the size of map output.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to