Brian Goerlitz created HIVE-17390:
-------------------------------------

             Summary: Select count(distinct) returns incorrect results using tez
                 Key: HIVE-17390
                 URL: https://issues.apache.org/jira/browse/HIVE-17390
             Project: Hive
          Issue Type: Bug
          Components: Query Planning
    Affects Versions: 1.2.1
            Reporter: Brian Goerlitz


With the following combination of settings, select count(distinct) will return 
the results of select sum(distinct).
hive.execution.engine=tez
hive.optimize.reducededuplication=true
hive.optimize.reducededuplication.min.reducer=1
hive.optimize.distinct.rewrite=true
hive.groupby.skewindata=false
hive.vectorized.execution.reduce.enabled=true

STEPS TO REPRODUCE:
{quote}CREATE TABLE `simple_data`(ppmonth int, sale double);
INSERT INTO simple_data VALUES 
(501,25000.0),(502,60000.0),(501,40000.0),(502,70000.0),(501,35000.0),(502,60000.0);
set hive.execution.engine=tez;
set hive.optimize.reducededuplication=true;
set hive.optimize.reducededuplication.min.reducer=1;
set hive.optimize.distinct.rewrite=true;
set hive.groupby.skewindata=false;
set hive.vectorized.execution.reduce.enabled=true;
select count(distinct ppmonth) from simple_data;{quote}
Returns 1003 rather than 2



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to