[ 
https://issues.apache.org/jira/browse/HIVE-16932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062321#comment-16062321
 ] 

Rajkumar Singh commented on HIVE-16932:
---------------------------------------

[~jcamachorodriguez] it seems you have disabled the hive.optimize.index.filter 
that's why you are getting a correct result every time, with ppd enable and 
hive.optimize.index.filter true you will get the count 14999 for query over orc 
table because the ORC predicate expression pushed to ORC Split is wrong for NOT 
BETWEEN operator.
{code}
 [INFO] [InputInitializer {Map 1} #0] |orc.OrcInputFormat|: ORC pushdown 
predicate: leaf-0 = (BETWEEN c 0 100000), leaf-1 = (BETWEEN c 25000 50000), 
expr = (and leaf-0 leaf-1)

{code}

> incorrect predicate evaluation
> ------------------------------
>
>                 Key: HIVE-16932
>                 URL: https://issues.apache.org/jira/browse/HIVE-16932
>             Project: Hive
>          Issue Type: Bug
>          Components: CLI, Hive, ORC
>    Affects Versions: 1.2.1
>         Environment: CentOS, HDP 2.6
>            Reporter: Jim Hopper
>
> hive returns incorrect number of rows when BETWEEN and NOT BETWEEN operators 
> are used in WHERE clause while querying a table that uses ORC as a storage 
> format.
> script to replicate the issue on HDP 2.6:
> {code}
> SET hive.exec.compress.output=false;
> SET hive.vectorized.execution.enabled=false;
> SET hive.optimize.ppd=true;
> SET hive.optimize.ppd.storage=true;
> SET N=100000;
> SET TTT=default.tmp_tbl_text;
> SET TTO=default.tmp_tbl_orc;
> DROP TABLE IF EXISTS ${hiveconf:TTT};
> DROP TABLE IF EXISTS ${hiveconf:TTO};
> create table ${hiveconf:TTT}
> stored as textfile
> as
> select pos as c
> from (
>     select posexplode(split(repeat(',', ${hiveconf:N}), ','))
> ) as t;
> create table ${hiveconf:TTO}
> stored as orc
> as
> select c
> from ${hiveconf:TTT};
> SELECT count(c) as cnt
> FROM ${hiveconf:TTT}
> WHERE
>     c between 0 and ${hiveconf:N}
>     and c not between ${hiveconf:N} div 4 and ${hiveconf:N} div 2
> ;
> SELECT count(c) as cnt
> FROM ${hiveconf:TTO}
> WHERE
>     c between 0 and ${hiveconf:N}
>     and c not between ${hiveconf:N} div 4 and ${hiveconf:N} div 2
> ;
> DROP TABLE IF EXISTS ${hiveconf:TTT};
> DROP TABLE IF EXISTS ${hiveconf:TTO};
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to