[ https://issues.apache.org/jira/browse/HIVE-9931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gopal V updated HIVE-9931: -------------------------- Labels: ORC (was: ) > Approximate nDV statistics from ORC bloom filter population > ----------------------------------------------------------- > > Key: HIVE-9931 > URL: https://issues.apache.org/jira/browse/HIVE-9931 > Project: Hive > Issue Type: Improvement > Components: Statistics > Affects Versions: 1.2.0 > Reporter: Gopal V > Labels: ORC > > The current CBO implementation requires column nDV statistics to produce good > estimates of JOIN selectivity and filter selectivity. > The ORC bloom filters provides an opportunity to estimate the net population > of a row-group with false-positive rates capped for each row-group. > This is not useful for filter conditions or join conditions with a > cardinality which is a large fraction of the row-count, but can collect > viable statistics for low-cardinality filter columns (de-normalization > scenarios) or for JOIN dimension columns of low cardinality (demographics or > store location). > The challenge in this feature is in distinguishing between these two > scenarios, not in the derivation of the approximate nDV itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)