Gopal V created HIVE-12094: ------------------------------ Summary: nDV of aggregate columns tend to be log scale - not unique Key: HIVE-12094 URL: https://issues.apache.org/jira/browse/HIVE-12094 Project: Hive Issue Type: Improvement Components: Statistics Affects Versions: 1.3.0, 2.0.0 Reporter: Gopal V
Stats for aggregate columns do not process properly if declared as a simple nDV {code} select count(distinct l_suppkey) from lineitem group by l_orderkey having count(distinct l_suppkey) = 1 {code} will mis-estimate the cardinality of the output by a significant margin. The log-scale of the nDV in general skews towards a very low number, which is not accounted for in the StatsRulesProcFactory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)