Gopal V created HIVE-12094:
------------------------------

             Summary: nDV of aggregate columns tend to be log scale - not unique
                 Key: HIVE-12094
                 URL: https://issues.apache.org/jira/browse/HIVE-12094
             Project: Hive
          Issue Type: Improvement
          Components: Statistics
    Affects Versions: 1.3.0, 2.0.0
            Reporter: Gopal V


Stats for aggregate columns do not process properly if declared as a simple nDV

{code}
select count(distinct l_suppkey) from lineitem group by l_orderkey having 
count(distinct l_suppkey)  = 1
{code}

will mis-estimate the cardinality of the output by a significant margin.

The log-scale of the nDV in general skews towards a very low number, which is 
not accounted for in the StatsRulesProcFactory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to