There are algorithms for doing general theta-joins in parallel. Search Google on "theta joins parallel database" and you will find some interesting references. I am not aware of any tools that implement these yet. You can also do it via a cross join followed by a filter, but again you need special algorithms to do a cross in MapReduce, which Hive doesn't implement yet. See http://ofps.oreilly.com/titles/9781449302641/advanced_pig_latin.html (search for the section on Cross) for a discussion of how to do cross in MapReduce.
Alan. On Mar 13, 2012, at 10:13 AM, Tucker, Matt wrote: > For theta joins, you’ll have to convert the query to an equi-join, and then > filter for non-equality in the WHERE clause. Depending upon the size of each > table, you might consider looking at map-side joins, which will allow for > doing non-equality filters during a join before it’s passed to the reducers. > > Matt Tucker > > From: mahsa mofidpoor [mailto:mofidp...@gmail.com] > Sent: Tuesday, March 13, 2012 1:02 PM > To: user@hive.apache.org > Subject: Re: non-equality joins > > > Hi Keith, > > Do you know exactly how an algorithm should be in order to fit in the > MapReduce framework? Could you refer me to some references? > > Thanks and Regards, > Mahsa > > > > On Tue, Mar 13, 2012 at 12:49 PM, Keith Wiley <kwi...@keithwiley.com> wrote: > https://cwiki.apache.org/Hive/languagemanual-joins.html > > "Hive does not support join conditions that are not equality conditions as it > is very difficult to express such conditions as a map/reduce job." > > I admit, that isn't a very detailed answer, but it gives some indication of > the reason for the discrepancy between Hive and other databases. Hive > fundamentally operates on Hadoop, namely on MapReduce (we all know this, I'm > just reiterating the train of thought). The problem is that certain > algorithms are exceedingly difficult to wedge into the MapReduce framework. > > That is as detailed as my personal insight can get. I've done a lot of > MapReduce programming in Hadoop but I'm not a database expert and I don't > really understand the steps involved in various kinds of table-joins, so I > don't understand the particular ways in which certain database operations do > or do not fit into MapReduce...but presumably nonequality joins (whatever > those are :-D ) are particularly difficult to MapReduceify. > > Cheers! > > On Mar 13, 2012, at 09:17 , mahsa mofidpoor wrote: > > > Hello, > > > > Is there a reason behind not implementing non-equality joins in Hive? In > > other words, is there any usage for theta-join, if implemented? > > > > Thank you in advance for your response, > > Mahsa > > > ________________________________________________________________________________ > Keith Wiley kwi...@keithwiley.com keithwiley.com > music.keithwiley.com > > "It's a fine line between meticulous and obsessive-compulsive and a slippery > rope between obsessive-compulsive and debilitatingly slow." > -- Keith Wiley > ________________________________________________________________________________ > >