James Ball created HIVE-15581: --------------------------------- Summary: Unable to use advanced aggregation with multiple inserts clause Key: HIVE-15581 URL: https://issues.apache.org/jira/browse/HIVE-15581 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.1 Reporter: James Ball
■Use Cases - Use multiple insert clauses within a single query to insert multiple static (user-defined) partitions into a single table. - Use advanced aggregation (cube) features within each insert clause to include subtotals of columns for each partition ■Expected Behaviour - Subtotals are inserted for all combinations of the set of columns ■Observed Behaviour - No subtotals are not inserted for any combination of the set of columns ■Sample Queries {code:sql} // Create test tables create table if not exists table1 ( column1 string, column2 string, column3 int ) stored as orc tblproperties ( "orc.compress" = "SNAPPY" ); create table if not exists table2 ( column1 string, column2 string, column3 int ) partitioned by ( partition1 string ) stored as orc tblproperties ( "orc.compress" = "SNAPPY" ); create table if not exists table3 ( column1 string, column2 string, column3 int ) partitioned by ( partition1 string ) stored as orc tblproperties ( "orc.compress" = "SNAPPY" ); {code} {code:sql} // Insert test values insert overwrite table table1 values ('value1', 'value1', 1), ('value2', 'value2', 1), ('value3', 'value3', 1); {code} {code:sql} // Single insert clause with multiple inserts syntax // Subtotals are inserted into target table from table1 insert overwrite table table2 partition ( partition1 = 'value1' ) select column1, column2, sum(column3) as column3 group by column1, column2 with cube; {code} {code:sql} // Multiple insert clauses with multiple inserts syntax // Subtotals are not inserted into target table from table1 insert overwrite table table3 partition ( partition1 = 'value1' ) select column1, column2, sum(column3) as column3 group by column1, column2 with cube insert overwrite table table3 partition ( partition1 = 'value2' ) select column1, column2, sum(column3) as column3 group by column1, column2 with cube; {code} ■Executions Plans - Single insert clause with multiple inserts syntax {noformat} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: table1 Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: column1 (type: string), column2 (type: string), column3 (type: int) outputColumnNames: column1, column2, column3 Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: sum(column3) keys: column1 (type: string), column2 (type: string), '0' (type: string) mode: hash outputColumnNames: _col0, _col1, _col2, _col3 Statistics: Num rows: 12 Data size: 2208 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string) sort order: +++ Map-reduce partition columns: _col0 (type: string), _col1 (type: string), _col2 (type: string) Statistics: Num rows: 12 Data size: 2208 Basic stats: COMPLETE Column stats: NONE value expressions: _col3 (type: bigint) Reduce Operator Tree: Group By Operator aggregations: sum(VALUE._col0) keys: KEY._col0 (type: string), KEY._col1 (type: string), KEY._col2 (type: string) mode: mergepartial outputColumnNames: _col0, _col1, _col3 Statistics: Num rows: 6 Data size: 1104 Basic stats: COMPLETE Column stats: NONE pruneGroupingSetId: true Select Operator expressions: _col0 (type: string), _col1 (type: string), UDFToInteger(_col3) (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 6 Data size: 1104 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 6 Data size: 1104 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: zzz_james_ball.table2 Stage: Stage-0 Move Operator tables: partition: partition1 value1 replace: true table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: zzz_james_ball.table2 Stage: Stage-2 Stats-Aggr Operator {noformat} - Single insert clause with multiple inserts syntax {noformat} STAGE DEPENDENCIES: Stage-2 is a root stage Stage-0 depends on stages: Stage-2 Stage-3 depends on stages: Stage-0 Stage-1 depends on stages: Stage-2 Stage-4 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-2 Map Reduce Map Operator Tree: TableScan alias: table1 Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: column1 (type: string), column2 (type: string), column3 (type: int) outputColumnNames: column1, column2, column3 Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: column1 (type: string), column2 (type: string) sort order: ++ Map-reduce partition columns: column1 (type: string), column2 (type: string) Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats: NONE value expressions: column3 (type: int) Reduce Operator Tree: Forward Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: sum(VALUE._col0) keys: KEY._col0 (type: string), KEY._col1 (type: string) mode: complete outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: string), _col1 (type: string), UDFToInteger(_col2) (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: zzz_james_ball.table3 Group By Operator aggregations: sum(VALUE._col0) keys: KEY._col0 (type: string), KEY._col1 (type: string) mode: complete outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: string), _col1 (type: string), UDFToInteger(_col2) (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: zzz_james_ball.table3 Stage: Stage-0 Move Operator tables: partition: partition1 value1 replace: true table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: zzz_james_ball.table3 Stage: Stage-3 Stats-Aggr Operator Stage: Stage-1 Move Operator tables: partition: partition1 value2 replace: true table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: zzz_james_ball.table3 Stage: Stage-4 Stats-Aggr Operator {noformat} ■Notes - This problem occurs with all advanced aggregation features (cube, grouping sets, rollup) - This problem occurs whether hive.map.aggr is set to true or false - Dynamic partitions are not used because the partition values are set manually within the where conditions of each insert clause -- This message was sent by Atlassian JIRA (v6.3.4#6332)