[ https://issues.apache.org/jira/browse/HIVE-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832786#comment-15832786 ]
Anthony Hsu edited comment on HIVE-15680 at 1/21/17 3:42 AM: ------------------------------------------------------------- Same issue, even with explicit aliases: {noformat} hive (default)> set hive.optimize.index.filter=true; hive (default)> select * from test_table x where number = 1 > union all > select * from test_table y where number = 2; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = ahsu_20170120193810_ffa4adbb-e408-4505-82aa-5abeb7a5dd1c Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Job running in-process (local Hadoop) 2017-01-20 19:38:11,937 Stage-1 map = 100%, reduce = 0% Ended Job = job_local876667430_0002 MapReduce Jobs Launched: Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK 2 Time taken: 1.711 seconds, Fetched: 1 row(s) {noformat} Here's the explain plan, which does show a single mapper processing two table scans: {noformat} hive (default)> explain > select * from test_table x where number = 1 > union all > select * from test_table y where number = 2; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: x filterExpr: (number = 1) (type: boolean) Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (number = 1) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: 1 (type: int) outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Union Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe TableScan alias: y filterExpr: (number = 2) (type: boolean) Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (number = 2) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: 2 (type: int) outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Union Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink Time taken: 0.237 seconds, Fetched: 55 row(s) {noformat} was (Author: erwaman): Same issue, even with explicit aliases: {noformat} hive (default)> set hive.optimize.index.filter=true; hive (default)> select * from test_table x where number = 1 > union all > select * from test_table y where number = 2; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = ahsu_20170120193810_ffa4adbb-e408-4505-82aa-5abeb7a5dd1c Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Job running in-process (local Hadoop) 2017-01-20 19:38:11,937 Stage-1 map = 100%, reduce = 0% Ended Job = job_local876667430_0002 MapReduce Jobs Launched: Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK 2 Time taken: 1.711 seconds, Fetched: 1 row(s) {noformat} > Incorrect results when hive.optimize.index.filter=true and same ORC table is > referenced twice in query > ------------------------------------------------------------------------------------------------------ > > Key: HIVE-15680 > URL: https://issues.apache.org/jira/browse/HIVE-15680 > Project: Hive > Issue Type: Bug > Affects Versions: 1.1.0, 2.2.0 > Reporter: Anthony Hsu > Assignee: Anthony Hsu > > To repro: > {noformat} > set hive.optimize.index.filter=true; > create table test_table(number int) stored as ORC; > -- Two insertions will create two files, with one stripe each > insert into table test_table VALUES (1); > insert into table test_table VALUES (2); > -- This should and does return 2 records > select * from test_table; > -- These should and do each return 1 record > select * from test_table where number = 1; > select * from test_table where number = 2; > -- This should return 2 records but only returns 1 record > select * from test_table where number = 1 > union all > select * from test_table where number = 2; > {noformat} > What's happening is only the last predicate is being pushed down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)