Re: custom table/column statistics

2014-06-09 Thread Prasanth Jayachandran
Hi Alex Here is the JIRA that tracks column group statistics https://issues.apache.org/jira/browse/HIVE-6540 Computing count distinct accurately demands lots of memory esp. in cases where there are too many distinct values. To overcome such huge memory requirement probabilistic data structures

Re: custom table/column statistics

2014-06-09 Thread Alex Nastetsky
Thanks Prasanth. Do we already have a ticket to add support for this or should I create one? Also, do you know why the single column distinct value is only an approximation instead of exact? Thanks. On Sun, Jun 8, 2014 at 10:13 PM, Prasanth Jayachandran < pjayachand...@hortonworks.com> wrote:

Re: Efficient Equality Joins of Large Tables

2014-06-09 Thread Sarfraz Ramay
May be UDF could solve your problem. Regards, Sarfraz Rasheed Ramay (DIT) Dublin, Ireland. On Mon, Jun 9, 2014 at 7:30 PM, Mark Desnoyer wrote: > Hi Furcy, > > Thanks for the reply. I looked at MapJoin but it won't do what I need > because all the tables will be large and actually, explicitly

Re: Efficient Equality Joins of Large Tables

2014-06-09 Thread Mark Desnoyer
Hi Furcy, Thanks for the reply. I looked at MapJoin but it won't do what I need because all the tables will be large and actually, explicitly going through the entire table in an n^2 fashion is very inefficient. I have large tables, but the intersection is very small. In the Ad Click case, I woul

Re: Efficient Equality Joins of Large Tables

2014-06-09 Thread Furcy Pin
Hi Mark, I'm not sure if I understand what your trying to do correctly, do you know the reference id on which you want to do the join beforehand? Or is one of your tables small? Or are they all big with a small intersection? I you haven't yet, I would suggest you to have a look at MapJoin: https:

Efficient Equality Joins of Large Tables

2014-06-09 Thread Mark Desnoyer
Hi, I was wondering if there was a way in Hive to trigger it to perform an efficient equality join on large tables? Specifically, I have two or more tables where the joined key is relatively rare in each table. A good example would be an AdClick scenario where you would have two tables, one for ad

Re: UDF development group details

2014-06-09 Thread Lefty Leverenz
The Hive wiki has useful information for contributors here: https://cwiki.apache.org/confluence/display/Hive/Home#Home-ResourcesforContributors . -- Lefty On Mon, Jun 9, 2014 at 2:19 AM, Nitin Pawar wrote: > there is no custom group for U

Re: UDF development group details

2014-06-09 Thread Nitin Pawar
there is no custom group for UDF only. You can post your questions to either hive-user group or hive-dev group On Mon, Jun 9, 2014 at 2:44 PM, Devopam Mittra wrote: > hi All, > Can you please redirect/connect me with Hive custom UDF development group. > > My intent is to create / co-develop c

UDF development group details

2014-06-09 Thread Devopam Mittra
hi All, Can you please redirect/connect me with Hive custom UDF development group. My intent is to create / co-develop custom UDFs for text analytics and data mining over Hive directly. -- Devopam Mittra Life and Relations are not binary