Gopal V created HIVE-9931:
-----------------------------

             Summary: Approximate nDV statistics from ORC bloom filter 
population
                 Key: HIVE-9931
                 URL: https://issues.apache.org/jira/browse/HIVE-9931
             Project: Hive
          Issue Type: Improvement
          Components: Statistics
    Affects Versions: 1.2.0
            Reporter: Gopal V


The current CBO implementation requires column nDV statistics to produce good 
estimates of JOIN selectivity and filter selectivity.

The ORC bloom filters provides an opportunity to estimate the net population of 
a row-group with false-positive rates capped for each row-group.

This is not useful for filter conditions or join conditions with a cardinality 
which is a large fraction of the row-count, but can collect viable statistics 
for low-cardinality filter columns (de-normalization scenarios) or for JOIN 
dimension columns of low cardinality (demographics or store location).

The challenge in this feature is in distinguishing between these two scenarios, 
not in the derivation of the approximate nDV itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to