Re: custom table/column statistics

2014-06-09 Thread Prasanth Jayachandran
Hi Alex Here is the JIRA that tracks column group statistics https://issues.apache.org/jira/browse/HIVE-6540 Computing count distinct accurately demands lots of memory esp. in cases where there are too many distinct values. To overcome such huge memory requirement probabilistic data structures

Re: custom table/column statistics

2014-06-09 Thread Alex Nastetsky
Thanks Prasanth. Do we already have a ticket to add support for this or should I create one? Also, do you know why the single column distinct value is only an approximation instead of exact? Thanks. On Sun, Jun 8, 2014 at 10:13 PM, Prasanth Jayachandran < pjayachand...@hortonworks.com> wrote:

Re: custom table/column statistics

2014-06-08 Thread Prasanth Jayachandran
Column group statistics is not supported in hive yet. Thanks Prasanth Sent from my iPhone > On Jun 8, 2014, at 6:33 PM, Alex Nastetsky wrote: > > Table statistics collection was added in HIVE-33 (numRows, rawDataSize, etc). > Is there anything that lets you create your own statistics gatheri