[ https://issues.apache.org/jira/browse/HIVE-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Prasanth Jayachandran updated HIVE-9689: ---------------------------------------- Summary: Store histograms and distinct value estimator's bit vectors in metastore (was: Store distinct value estimator's bit vectors in metastore) > Store histograms and distinct value estimator's bit vectors in metastore > ------------------------------------------------------------------------ > > Key: HIVE-9689 > URL: https://issues.apache.org/jira/browse/HIVE-9689 > Project: Hive > Issue Type: New Feature > Reporter: Prasanth Jayachandran > Labels: gsoc, gsoc2015, hive, java > > Hive currently uses PCSA (Probabilistic Counting and Stochastic Averaging) > algorithm to determine distinct cardinality. The NDV value determined from > the UDF is stored in the metastore instead of the actual bit vectors. This > makes it impossible to estimate the overall NDV across all the partitions (or > selected partitions). We should ideally store the bitvectors in the metastore > and do server side merging of the bitvectors. Also we could replace the > current PCSA algorithm in favour of HyperLogLog if space is a constraint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)