[ https://issues.apache.org/jira/browse/HIVE-15146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eugene Koifman updated HIVE-15146: ---------------------------------- Description: Consider: {noformat} create table if not exists srcpart (a int, b int, c int) partitioned by (z int) clustered by (a) into 2 buckets stored as orc tblproperties("transactional"="true"); create temporary table if not exists data1 (x int); insert into data1 values (1),(2),(3); explain from data1 insert into srcpart partition(z) select 0,0,1,x insert into srcpart partition(z=1) select 0,0,1; {noformat} Then the plan looks like: {noformat} 2016-11-07T16:56:19,045 INFO [main] ql.TestTxnCommands2: STAGE DEPENDENCIES: Stage-2 is a root stage Stage-0 depends on stages: Stage-2 Stage-3 depends on stages: Stage-0 Stage-4 depends on stages: Stage-2 Stage-1 depends on stages: Stage-4 Stage-5 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-2 Map Reduce Map Operator Tree: TableScan alias: data1 Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: x (type: int) outputColumnNames: _col3 Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Map-reduce partition columns: 0 (type: int) Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE value expressions: _col3 (type: int) Select Operator Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe Reduce Operator Tree: Select Operator expressions: 0 (type: int), 0 (type: int), 1 (type: int), VALUE._col2 (type: int) outputColumnNames: _col0, _col1, _col2, _col3 Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: default.srcpart Stage: Stage-0 Move Operator tables: partition: z replace: false table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: default.srcpart Stage: Stage-3 Stats-Aggr Operator Stage: Stage-4 Map Reduce Map Operator Tree: TableScan Reduce Output Operator sort order: Map-reduce partition columns: 0 (type: int) Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE Reduce Operator Tree: Select Operator expressions: 0 (type: int), 0 (type: int), 1 (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: default.srcpart Stage: Stage-1 Move Operator tables: partition: z 1 replace: false table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: default.srcpart Stage: Stage-5 Stats-Aggr Operator {noformat} Note that there are 2 stats aggregation tasks but both branches of the multi-insert update the same partition Once HIVE-14943 is in, there will be other ways to generate the same situation. In particular it will be possible to have 2 or 3 branches of the multi-insert any or all of which are using dynamic partition insert which means the set of partitions actually updated is not known until run-time. If at all possible, the solution should address this. was: Consider: {noformat} create table if not exists srcpart (a int, b int, c int) partitioned by (z int) clustered by (a) into 2 buckets stored as orc tblproperties("transactional"="true"); create temporary table if not exists data1 (x int); insert into data1 values (1),(2),(3); explain from data1 insert into srcpart partition(z) select 0,0,1,x insert into srcpart partition(z=1) select 0,0,1; {noformat} Then the plan looks like: {noformat} 2016-11-07T16:56:19,045 INFO [main] ql.TestTxnCommands2: STAGE DEPENDENCIES: Stage-2 is a root stage Stage-0 depends on stages: Stage-2 Stage-3 depends on stages: Stage-0 Stage-4 depends on stages: Stage-2 Stage-1 depends on stages: Stage-4 Stage-5 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-2 Map Reduce Map Operator Tree: TableScan alias: data1 Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: x (type: int) outputColumnNames: _col3 Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Map-reduce partition columns: 0 (type: int) Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE value expressions: _col3 (type: int) Select Operator Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe Reduce Operator Tree: Select Operator expressions: 0 (type: int), 0 (type: int), 1 (type: int), VALUE._col2 (type: int) outputColumnNames: _col0, _col1, _col2, _col3 Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: default.srcpart Stage: Stage-0 Move Operator tables: partition: z replace: false table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: default.srcpart Stage: Stage-3 Stats-Aggr Operator Stage: Stage-4 Map Reduce Map Operator Tree: TableScan Reduce Output Operator sort order: Map-reduce partition columns: 0 (type: int) Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE Reduce Operator Tree: Select Operator expressions: 0 (type: int), 0 (type: int), 1 (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: default.srcpart Stage: Stage-1 Move Operator tables: partition: z 1 replace: false table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: default.srcpart Stage: Stage-5 Stats-Aggr Operator {noformat} Note that there are 2 stats aggregation tasks but both branches of the multi-insert update the same partition Once HIVE-14943 is in, there will be other ways to generate the same sitation > Too many Stats-Aggr Operator in multi-insert > -------------------------------------------- > > Key: HIVE-15146 > URL: https://issues.apache.org/jira/browse/HIVE-15146 > Project: Hive > Issue Type: Bug > Components: Query Planning > Reporter: Eugene Koifman > Assignee: Pengcheng Xiong > > Consider: > {noformat} > create table if not exists srcpart (a int, b int, c int) > partitioned by (z int) > clustered by (a) into 2 buckets > stored as orc > tblproperties("transactional"="true"); > create temporary table if not exists data1 (x int); > insert into data1 values (1),(2),(3); > explain from data1 > insert into srcpart partition(z) select 0,0,1,x > insert into srcpart partition(z=1) select 0,0,1; > {noformat} > Then the plan looks like: > {noformat} > 2016-11-07T16:56:19,045 INFO [main] ql.TestTxnCommands2: STAGE DEPENDENCIES: > Stage-2 is a root stage > Stage-0 depends on stages: Stage-2 > Stage-3 depends on stages: Stage-0 > Stage-4 depends on stages: Stage-2 > Stage-1 depends on stages: Stage-4 > Stage-5 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-2 > Map Reduce > Map Operator Tree: > TableScan > alias: data1 > Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column > stats: NONE > Select Operator > expressions: x (type: int) > outputColumnNames: _col3 > Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE > Column stats: NONE > Reduce Output Operator > sort order: > Map-reduce partition columns: 0 (type: int) > Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE > Column stats: NONE > value expressions: _col3 (type: int) > Select Operator > Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE > Column stats: NONE > File Output Operator > compressed: false > table: > input format: > org.apache.hadoop.mapred.SequenceFileInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe > Reduce Operator Tree: > Select Operator > expressions: 0 (type: int), 0 (type: int), 1 (type: int), > VALUE._col2 (type: int) > outputColumnNames: _col0, _col1, _col2, _col3 > Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column > stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column > stats: NONE > table: > input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat > output format: > org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat > serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde > name: default.srcpart > Stage: Stage-0 > Move Operator > tables: > partition: > z > replace: false > table: > input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat > output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat > serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde > name: default.srcpart > Stage: Stage-3 > Stats-Aggr Operator > Stage: Stage-4 > Map Reduce > Map Operator Tree: > TableScan > Reduce Output Operator > sort order: > Map-reduce partition columns: 0 (type: int) > Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE > Column stats: NONE > Reduce Operator Tree: > Select Operator > expressions: 0 (type: int), 0 (type: int), 1 (type: int) > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column > stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column > stats: NONE > table: > input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat > output format: > org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat > serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde > name: default.srcpart > Stage: Stage-1 > Move Operator > tables: > partition: > z 1 > replace: false > table: > input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat > output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat > serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde > name: default.srcpart > Stage: Stage-5 > Stats-Aggr Operator > {noformat} > Note that there are 2 stats aggregation tasks but both branches of the > multi-insert update the same partition > Once HIVE-14943 is in, there will be other ways to generate the same > situation. > In particular it will be possible to have 2 or 3 branches of the multi-insert > any or all of which are using dynamic partition insert which means the set of > partitions actually updated is not known until run-time. > If at all possible, the solution should address this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)