[jira] [Created] (HIVE-15581) Unable to use advanced aggregation with multiple inserts clause

James Ball (JIRA) Wed, 11 Jan 2017 00:36:12 -0800

James Ball created HIVE-15581:
---------------------------------

             Summary: Unable to use advanced aggregation with multiple inserts 
clause
                 Key: HIVE-15581
                 URL: https://issues.apache.org/jira/browse/HIVE-15581
             Project: Hive
          Issue Type: Bug
          Components: Hive
    Affects Versions: 1.2.1
            Reporter: James Ball



■Use Cases
- Use multiple insert clauses within a single query to insert multiple static 
(user-defined) partitions into a single table.
- Use advanced aggregation (cube) features within each insert clause to include 
subtotals of columns for each partition

■Expected Behaviour
- Subtotals are inserted for all combinations of the set of columns

■Observed Behaviour
- No subtotals are not inserted for any combination of the set of columns

■Sample Queries
{code:sql}
// Create test tables
create table if not exists
        table1
        (
                column1 string,
                column2 string,
                column3 int
                )
        stored as orc
        tblproperties
        (
                "orc.compress" = "SNAPPY"
                );

create table if not exists
        table2
        (
                column1 string,
                column2 string,
                column3 int
                )
        partitioned by
        (
                partition1 string
                )
        stored as orc
        tblproperties
        (
                "orc.compress" = "SNAPPY"
                );

create table if not exists
        table3
        (
                column1 string,
                column2 string,
                column3 int
                )
        partitioned by
        (
                partition1 string
                )
        stored as orc
        tblproperties
        (
                "orc.compress" = "SNAPPY"
                );
{code}

{code:sql}
// Insert test values
insert overwrite table
        table1
        values
                ('value1', 'value1', 1),
                ('value2', 'value2', 1),
                ('value3', 'value3', 1);
{code}

{code:sql}
// Single insert clause with multiple inserts syntax
// Subtotals are inserted into target table
from
        table1
insert overwrite table
        table2
        partition
        (
                partition1 = 'value1'
                )
        select
                column1,
                column2,
                sum(column3) as column3
        group by
                column1,
                column2
        with cube;
{code}

{code:sql}
// Multiple insert clauses with multiple inserts syntax
// Subtotals are not inserted into target table
from
        table1
insert overwrite table
        table3
        partition
        (
                partition1 = 'value1'
                )
        select
                column1,
                column2,
                sum(column3) as column3
        group by
                column1,
                column2
        with cube
insert overwrite table
        table3
        partition
        (
                partition1 = 'value2'
                )
        select
                column1,
                column2,
                sum(column3) as column3
        group by
                column1,
                column2
        with cube;
{code}

■Executions Plans
- Single insert clause with multiple inserts syntax
{noformat}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1
  Stage-2 depends on stages: Stage-0

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: table1
            Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column 
stats: NONE
            Select Operator
              expressions: column1 (type: string), column2 (type: string), 
column3 (type: int)
              outputColumnNames: column1, column2, column3
              Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE 
Column stats: NONE
              Group By Operator
                aggregations: sum(column3)
                keys: column1 (type: string), column2 (type: string), '0' 
(type: string)
                mode: hash
                outputColumnNames: _col0, _col1, _col2, _col3
                Statistics: Num rows: 12 Data size: 2208 Basic stats: COMPLETE 
Column stats: NONE
                Reduce Output Operator
                  key expressions: _col0 (type: string), _col1 (type: string), 
_col2 (type: string)
                  sort order: +++
                  Map-reduce partition columns: _col0 (type: string), _col1 
(type: string), _col2 (type: string)
                  Statistics: Num rows: 12 Data size: 2208 Basic stats: 
COMPLETE Column stats: NONE
                  value expressions: _col3 (type: bigint)
      Reduce Operator Tree:
        Group By Operator
          aggregations: sum(VALUE._col0)
          keys: KEY._col0 (type: string), KEY._col1 (type: string), KEY._col2 
(type: string)
          mode: mergepartial
          outputColumnNames: _col0, _col1, _col3
          Statistics: Num rows: 6 Data size: 1104 Basic stats: COMPLETE Column 
stats: NONE
          pruneGroupingSetId: true
          Select Operator
            expressions: _col0 (type: string), _col1 (type: string), 
UDFToInteger(_col3) (type: int)
            outputColumnNames: _col0, _col1, _col2
            Statistics: Num rows: 6 Data size: 1104 Basic stats: COMPLETE 
Column stats: NONE
            File Output Operator
              compressed: false
              Statistics: Num rows: 6 Data size: 1104 Basic stats: COMPLETE 
Column stats: NONE
              table:
                  input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                  output format: 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                  serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                  name: zzz_james_ball.table2

  Stage: Stage-0
    Move Operator
      tables:
          partition:
            partition1 value1
          replace: true
          table:
              input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
              output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
              serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
              name: zzz_james_ball.table2

  Stage: Stage-2
    Stats-Aggr Operator
{noformat}
- Single insert clause with multiple inserts syntax
{noformat}
STAGE DEPENDENCIES:
  Stage-2 is a root stage
  Stage-0 depends on stages: Stage-2
  Stage-3 depends on stages: Stage-0
  Stage-1 depends on stages: Stage-2
  Stage-4 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-2
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: table1
            Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column 
stats: NONE
            Select Operator
              expressions: column1 (type: string), column2 (type: string), 
column3 (type: int)
              outputColumnNames: column1, column2, column3
              Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE 
Column stats: NONE
              Reduce Output Operator
                key expressions: column1 (type: string), column2 (type: string)
                sort order: ++
                Map-reduce partition columns: column1 (type: string), column2 
(type: string)
                Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE 
Column stats: NONE
                value expressions: column3 (type: int)
      Reduce Operator Tree:
        Forward
          Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column 
stats: NONE
          Group By Operator
            aggregations: sum(VALUE._col0)
            keys: KEY._col0 (type: string), KEY._col1 (type: string)
            mode: complete
            outputColumnNames: _col0, _col1, _col2
            Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column 
stats: NONE
            Select Operator
              expressions: _col0 (type: string), _col1 (type: string), 
UDFToInteger(_col2) (type: int)
              outputColumnNames: _col0, _col1, _col2
              Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
Column stats: NONE
              File Output Operator
                compressed: false
                Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
Column stats: NONE
                table:
                    input format: 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                    output format: 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                    serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                    name: zzz_james_ball.table3
          Group By Operator
            aggregations: sum(VALUE._col0)
            keys: KEY._col0 (type: string), KEY._col1 (type: string)
            mode: complete
            outputColumnNames: _col0, _col1, _col2
            Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column 
stats: NONE
            Select Operator
              expressions: _col0 (type: string), _col1 (type: string), 
UDFToInteger(_col2) (type: int)
              outputColumnNames: _col0, _col1, _col2
              Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
Column stats: NONE
              File Output Operator
                compressed: false
                Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
Column stats: NONE
                table:
                    input format: 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                    output format: 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                    serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                    name: zzz_james_ball.table3

  Stage: Stage-0
    Move Operator
      tables:
          partition:
            partition1 value1
          replace: true
          table:
              input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
              output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
              serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
              name: zzz_james_ball.table3

  Stage: Stage-3
    Stats-Aggr Operator

  Stage: Stage-1
    Move Operator
      tables:
          partition:
            partition1 value2
          replace: true
          table:
              input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
              output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
              serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
              name: zzz_james_ball.table3

  Stage: Stage-4
    Stats-Aggr Operator
{noformat}

■Notes
- This problem occurs with all advanced aggregation features (cube, grouping 
sets, rollup)
- This problem occurs whether hive.map.aggr is set to true or false
- Dynamic partitions are not used because the partition values are set manually 
within the where conditions of each insert clause



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15581) Unable to use advanced aggregation with multiple inserts clause

Reply via email to