Hi Alex

Here is the JIRA that tracks column group statistics 
https://issues.apache.org/jira/browse/HIVE-6540
Computing count distinct accurately demands lots of memory esp. in cases where 
there are too many distinct values. To overcome such huge memory requirement 
probabilistic data structures are often used to get approximate count. Hive 
uses such probabilistic algorithms to estimate the distinct count. The error 
rate of the estimation can be tuned using “hive.stats.ndv.error”. By default 
the error is set to 20%.

Thanks
Prasanth Jayachandran

On Jun 9, 2014, at 12:33 PM, Alex Nastetsky <anastet...@spryinc.com> wrote:

> Thanks Prasanth.
> 
> Do we already have a ticket to add support for this or should I create one?
> Also, do you know why the single column distinct value is only an 
> approximation instead of exact?
> 
> Thanks.
> 
> 
> On Sun, Jun 8, 2014 at 10:13 PM, Prasanth Jayachandran 
> <pjayachand...@hortonworks.com> wrote:
> Column group statistics is not supported in hive yet.
> 
> Thanks
> Prasanth
> 
> Sent from my iPhone
> 
> > On Jun 8, 2014, at 6:33 PM, Alex Nastetsky <anastet...@spryinc.com> wrote:
> >
> > Table statistics collection was added in HIVE-33 (numRows, rawDataSize, 
> > etc). Is there anything that lets you create your own statistics gathering?
> >
> > For example, given table A with columns x, y, z, I want to gather 
> > count(distinct x, y) as a statistic that would be stored in the metastore.
> >
> > I know there exist column level statistics that have approximations for 
> > distinct for a single column, but that doesn't help my use case above where 
> > I have 2 columns involved.
> >
> > Thanks,
> > Alex.
> 
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Reply via email to