Nicholas Brenwald created HIVE-12678: ----------------------------------------
Summary: BETWEEN relational operator sometimes returns incorrect results against PARQUET tables Key: HIVE-12678 URL: https://issues.apache.org/jira/browse/HIVE-12678 Project: Hive Issue Type: Bug Affects Versions: 1.1.0 Reporter: Nicholas Brenwald When querying a parquet table, the BETWEEN relational operator returns incorrect results when hive.optimize.index.filter and hive.optimize.ppd.storage are enabled Create a parquet table: {code} create table t(c string) stored as parquet; {code} Insert some strings representing dates {code} insert into t select '2015-12-09' from default.dual limit 1; insert into t select '2015-12-10' from default.dual limit 1; insert into t select '2015-12-11' from default.dual limit 1; {code} h3. Example 1 This query correctly returns 3: {code} set hive.optimize.index.filter=true; set hive.optimize.ppd.storage=true; select count(*) from t where c >= '2015-12-09' and c <= '2015-12-11'; +------+--+ | _c0 | +------+--+ | 3 | +------+--+ {code} This query incorrectly returns 1: {code} set hive.optimize.index.filter=true; set hive.optimize.ppd.storage=true; select count(*) from t where c between '2015-12-09' and '2015-12-11'; +------+--+ | _c0 | +------+--+ | 1 | +------+--+ {code} Disabling hive.optimize.findex.filter resolves the problem. This query now correctly returns 3: {code} set hive.optimize.index.filter=false; set hive.optimize.ppd.storage=true; select count(*) from t where c between '2015-12-09' and '2015-12-11'; +------+--+ | _c0 | +------+--+ | 3 | +------+--+ {code} Disabling hive.optimize.ppd.storage resolves the problem. This query now correctly returns 3: {code} set hive.optimize.index.filter=true; set hive.optimize.ppd.storage=false; select count(*) from t where c between '2015-12-09' and '2015-12-11'; +------+--+ | _c0 | +------+--+ | 3 | +------+--+ {code} h3. Example 2 This query correctly returns 1: {code} set hive.optimize.index.filter=true; set hive.optimize.ppd.storage=true; select count(*) from t where c >= '2015-12-10' and c <= '2015-12-10'; +------+--+ | _c0 | +------+--+ | 1 | +------+--+ {code} This query incorrectly returns 0: {code} set hive.optimize.index.filter=true; set hive.optimize.ppd.storage=true; select count(*) from t where c between '2015-12-10' and '2015-12-10'; +------+--+ | _c0 | +------+--+ | 0 | +------+--+ {code} Disabling hive.optimize.findex.filter resolves the problem. This query now correctly returns 1: {code} set hive.optimize.index.filter=false; set hive.optimize.ppd.storage=true; select count(*) from t where c >= '2015-12-10' and c <= '2015-12-10'; +------+--+ | _c0 | +------+--+ | 1 | +------+--+ {code} Disabling hive.optimize.ppd.storage resolves the problem. This query now correctly returns 1: {code} set hive.optimize.index.filter=true; set hive.optimize.ppd.storage=false; select count(*) from t where c >= '2015-12-10' and c <= '2015-12-10'; +------+--+ | _c0 | +------+--+ | 1 | +------+--+ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)