[ https://issues.apache.org/jira/browse/HIVE-11603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702941#comment-14702941 ]
Nicholas Brenwald commented on HIVE-11603: ------------------------------------------ Query plan from branch-1: {code} :~/$ hive Logging initialized using configuration in jar:file:~/branch-1/hive/packaging/target/apache-hive-1.3.0-SNAPSHOT-bin/apache-hive-1.3.0-SNAPSHOT-bin/lib/hive-common-1.3.0-SNAPSHOT.jar!/hive-log4j.properties hive> EXPLAIN SELECT COUNT(*) from v1 WHERE c2 = 0; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: t1 Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE Filter Operator predicate: false (type: boolean) Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE Union Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: PARTIAL Select Operator Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: PARTIAL Group By Operator aggregations: count() mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: PARTIAL Reduce Output Operator sort order: Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: PARTIAL value expressions: _col0 (type: bigint) TableScan alias: t2 Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE Filter Operator predicate: (c2 = 0) (type: boolean) Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE Select Operator Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE Union Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: PARTIAL Select Operator Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: PARTIAL Group By Operator aggregations: count() mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: PARTIAL Reduce Output Operator sort order: Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: PARTIAL value expressions: _col0 (type: bigint) Reduce Operator Tree: Group By Operator aggregations: count(VALUE._col0) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: PARTIAL File Output Operator compressed: false Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: PARTIAL table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink {code} Query plan from branch master {code} :~/$ hive Logging initialized using configuration in jar:file:~/master/hive/packaging/target/apache-hive-2.0.0-SNAPSHOT-bin/apache-hive-2.0.0-SNAPSHOT-bin/lib/hive-common-2.0.0-SNAPSHOT.jar!/hive-log4j2.xml hive> EXPLAIN SELECT COUNT(*) from v1 WHERE c2 = 0; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: t1 Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE Filter Operator predicate: false (type: boolean) Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE Select Operator expressions: c1 (type: string), null (type: int) outputColumnNames: _col0, _col1 Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE Union Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE Select Operator Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE Group By Operator aggregations: count() mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: bigint) TableScan alias: t2 Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE Filter Operator predicate: (c2 = 0) (type: boolean) Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE Select Operator expressions: c1 (type: string), 0 (type: int) outputColumnNames: _col0, _col1 Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE Union Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE Select Operator Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE Group By Operator aggregations: count() mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: bigint) Reduce Operator Tree: Group By Operator aggregations: count(VALUE._col0) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink {code} > IndexOutOfBoundsException thrown when accessing a union all subquery and > filtering on a column which does not exist in all underlying tables > -------------------------------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-11603 > URL: https://issues.apache.org/jira/browse/HIVE-11603 > Project: Hive > Issue Type: Bug > Affects Versions: 1.3.0 > Environment: Hadoop 2.6 > Reporter: Nicholas Brenwald > Priority: Minor > Fix For: 2.0.0 > > > Create two empty tables t1 and t2 > {code} > CREATE TABLE t1(c1 STRING); > CREATE TABLE t2(c1 STRING, c2 INT); > {code} > Create a view on these two tables > {code} > CREATE VIEW v1 AS > SELECT c1, c2 > FROM ( > SELECT c1, CAST(NULL AS INT) AS c2 FROM t1 > UNION ALL > SELECT c1, c2 FROM t2 > ) x; > {code} > Then run > {code} > SELECT COUNT(*) from v1 > WHERE c2 = 0; > {code} > We expect to get a result of zero, but instead the query fails with stack > trace: > {code} > Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:635) > at java.util.ArrayList.get(ArrayList.java:411) > at > org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:86) > at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362) > at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) > at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) > at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) > at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) > at > org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:442) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:119) > ... 22 more > {code} > Workarounds include disabling ppd, > {code} > set hive.optimize.ppd=false; > {code} > Or changing the view so that column c2 is null cast to double: > {code} > CREATE VIEW v1_workaround AS > SELECT c1, c2 > FROM ( > SELECT c1, CAST(NULL AS DOUBLE) AS c2 FROM t1 > UNION ALL > SELECT c1, c2 FROM t2 > ) x; > {code} > The problem seems to occur in branch-1.1, branch-1.2, branch-1 but seems to > be resolved in master (2.0.0) -- This message was sent by Atlassian JIRA (v6.3.4#6332)