Re: non-equality joins

Alan Gates Fri, 16 Mar 2012 17:42:32 -0700

There are algorithms for doing general theta-joins in parallel.  Search Google 
on "theta joins parallel database" and you will find some interesting 
references.  I am not aware of any tools that implement these yet.  You can 
also do it via a cross join followed by a filter, but again you need special 
algorithms to do a cross in MapReduce, which Hive doesn't implement yet.  See 
http://ofps.oreilly.com/titles/9781449302641/advanced_pig_latin.html (search 
for the section on Cross) for a discussion of how to do cross in MapReduce.


Alan.

On Mar 13, 2012, at 10:13 AM, Tucker, Matt wrote:

> For theta joins, you’ll have to convert the query to an equi-join, and then 
> filter for non-equality in the WHERE clause.  Depending upon the size of each 
> table, you might consider looking at map-side joins, which will allow for 
> doing non-equality filters during a join before it’s passed to the reducers.
>  
> Matt Tucker
>  
> From: mahsa mofidpoor [mailto:mofidp...@gmail.com] 
> Sent: Tuesday, March 13, 2012 1:02 PM
> To: user@hive.apache.org
> Subject: Re: non-equality joins
>  
>  
> Hi Keith,
>  
> Do you know exactly how an algorithm should be in order to fit in the 
> MapReduce framework? Could you refer me to some references?
>  
> Thanks and Regards,
> Mahsa
>  
>  
>  
> On Tue, Mar 13, 2012 at 12:49 PM, Keith Wiley <kwi...@keithwiley.com> wrote:
> https://cwiki.apache.org/Hive/languagemanual-joins.html
> 
> "Hive does not support join conditions that are not equality conditions as it 
> is very difficult to express such conditions as a map/reduce job."
> 
> I admit, that isn't a very detailed answer, but it gives some indication of 
> the reason for the discrepancy between Hive and other databases.  Hive 
> fundamentally operates on Hadoop, namely on MapReduce (we all know this, I'm 
> just reiterating the train of thought).  The problem is that certain 
> algorithms are exceedingly difficult to wedge into the MapReduce framework.
> 
> That is as detailed as my personal insight can get.  I've done a lot of 
> MapReduce programming in Hadoop but I'm not a database expert and I don't 
> really understand the steps involved in various kinds of table-joins, so I 
> don't understand the particular ways in which certain database operations do 
> or do not fit into MapReduce...but presumably nonequality joins (whatever 
> those are :-D ) are particularly difficult to MapReduceify.
> 
> Cheers!
> 
> On Mar 13, 2012, at 09:17 , mahsa mofidpoor wrote:
> 
> > Hello,
> >
> > Is there a reason behind not implementing non-equality joins in Hive? In 
> > other words, is there any usage for theta-join, if implemented?
> >
> > Thank you in advance for your response,
> > Mahsa
> 
> 
> ________________________________________________________________________________
> Keith Wiley     kwi...@keithwiley.com     keithwiley.com    
> music.keithwiley.com
> 
> "It's a fine line between meticulous and obsessive-compulsive and a slippery
> rope between obsessive-compulsive and debilitatingly slow."
>                                           --  Keith Wiley
> ________________________________________________________________________________
> 
>

Re: non-equality joins

Reply via email to