Brian Goerlitz created HIVE-17390:
-------------------------------------
Summary: Select count(distinct) returns incorrect results using tez
Key: HIVE-17390
URL: https://issues.apache.org/jira/browse/HIVE-17390
Project: Hive
Issue Type: Bug
Components: Query Planning
Affects Versions: 1.2.1
Reporter: Brian Goerlitz
With the following combination of settings, select count(distinct) will return
the results of select sum(distinct).
hive.execution.engine=tez
hive.optimize.reducededuplication=true
hive.optimize.reducededuplication.min.reducer=1
hive.optimize.distinct.rewrite=true
hive.groupby.skewindata=false
hive.vectorized.execution.reduce.enabled=true
STEPS TO REPRODUCE:
{quote}CREATE TABLE `simple_data`(ppmonth int, sale double);
INSERT INTO simple_data VALUES
(501,25000.0),(502,60000.0),(501,40000.0),(502,70000.0),(501,35000.0),(502,60000.0);
set hive.execution.engine=tez;
set hive.optimize.reducededuplication=true;
set hive.optimize.reducededuplication.min.reducer=1;
set hive.optimize.distinct.rewrite=true;
set hive.groupby.skewindata=false;
set hive.vectorized.execution.reduce.enabled=true;
select count(distinct ppmonth) from simple_data;{quote}
Returns 1003 rather than 2
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)