Gopal V created HIVE-9931: ----------------------------- Summary: Approximate nDV statistics from ORC bloom filter population Key: HIVE-9931 URL: https://issues.apache.org/jira/browse/HIVE-9931 Project: Hive Issue Type: Improvement Components: Statistics Affects Versions: 1.2.0 Reporter: Gopal V
The current CBO implementation requires column nDV statistics to produce good estimates of JOIN selectivity and filter selectivity. The ORC bloom filters provides an opportunity to estimate the net population of a row-group with false-positive rates capped for each row-group. This is not useful for filter conditions or join conditions with a cardinality which is a large fraction of the row-count, but can collect viable statistics for low-cardinality filter columns (de-normalization scenarios) or for JOIN dimension columns of low cardinality (demographics or store location). The challenge in this feature is in distinguishing between these two scenarios, not in the derivation of the approximate nDV itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)