[jira] [Comment Edited] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query

Anthony Hsu (JIRA) Fri, 20 Jan 2017 19:44:02 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832786#comment-15832786
 ]


Anthony Hsu edited comment on HIVE-15680 at 1/21/17 3:42 AM:
-------------------------------------------------------------

Same issue, even with explicit aliases:
{noformat}
hive (default)> set hive.optimize.index.filter=true;
hive (default)> select * from test_table x where number = 1
              > union all
              > select * from test_table y where number = 2;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. spark, tez) 
or using Hive 1.X releases.
Query ID = ahsu_20170120193810_ffa4adbb-e408-4505-82aa-5abeb7a5dd1c
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2017-01-20 19:38:11,937 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local876667430_0002
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
2
Time taken: 1.711 seconds, Fetched: 1 row(s)
{noformat}

Here's the explain plan, which does show a single mapper processing two table 
scans:
{noformat}
hive (default)> explain
              > select * from test_table x where number = 1
              > union all
              > select * from test_table y where number = 2;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: x
            filterExpr: (number = 1) (type: boolean)
            Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE Column 
stats: NONE
            Filter Operator
              predicate: (number = 1) (type: boolean)
              Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
stats: NONE
              Select Operator
                expressions: 1 (type: int)
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
Column stats: NONE
                Union
                  Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
                    table:
                        input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                        serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
          TableScan
            alias: y
            filterExpr: (number = 2) (type: boolean)
            Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE Column 
stats: NONE
            Filter Operator
              predicate: (number = 2) (type: boolean)
              Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
stats: NONE
              Select Operator
                expressions: 2 (type: int)
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
Column stats: NONE
                Union
                  Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
                    table:
                        input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                        serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 0.237 seconds, Fetched: 55 row(s)
{noformat}


was (Author: erwaman):
Same issue, even with explicit aliases:
{noformat}
hive (default)> set hive.optimize.index.filter=true;
hive (default)> select * from test_table x where number = 1
              > union all
              > select * from test_table y where number = 2;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. spark, tez) 
or using Hive 1.X releases.
Query ID = ahsu_20170120193810_ffa4adbb-e408-4505-82aa-5abeb7a5dd1c
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2017-01-20 19:38:11,937 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local876667430_0002
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
2
Time taken: 1.711 seconds, Fetched: 1 row(s)
{noformat}

> Incorrect results when hive.optimize.index.filter=true and same ORC table is 
> referenced twice in query
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-15680
>                 URL: https://issues.apache.org/jira/browse/HIVE-15680
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.1.0, 2.2.0
>            Reporter: Anthony Hsu
>            Assignee: Anthony Hsu
>
> To repro:
> {noformat}
> set hive.optimize.index.filter=true;
> create table test_table(number int) stored as ORC;
> -- Two insertions will create two files, with one stripe each
> insert into table test_table VALUES (1);
> insert into table test_table VALUES (2);
> -- This should and does return 2 records
> select * from test_table;
> -- These should and do each return 1 record
> select * from test_table where number = 1;
> select * from test_table where number = 2;
> -- This should return 2 records but only returns 1 record
> select * from test_table where number = 1
> union all
> select * from test_table where number = 2;
> {noformat}
> What's happening is only the last predicate is being pushed down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-15680) Incorrect results when hive.optimize.index.filter=true and same ORC table is referenced twice in query

Reply via email to