Hi Alex
Here is the JIRA that tracks column group statistics
https://issues.apache.org/jira/browse/HIVE-6540
Computing count distinct accurately demands lots of memory esp. in cases where
there are too many distinct values. To overcome such huge memory requirement
probabilistic data structures
Thanks Prasanth.
Do we already have a ticket to add support for this or should I create one?
Also, do you know why the single column distinct value is only an
approximation instead of exact?
Thanks.
On Sun, Jun 8, 2014 at 10:13 PM, Prasanth Jayachandran <
pjayachand...@hortonworks.com> wrote:
Column group statistics is not supported in hive yet.
Thanks
Prasanth
Sent from my iPhone
> On Jun 8, 2014, at 6:33 PM, Alex Nastetsky wrote:
>
> Table statistics collection was added in HIVE-33 (numRows, rawDataSize, etc).
> Is there anything that lets you create your own statistics gatheri