Hi, I think matt's solution is the way to go for now. If you need some basic understanding on how reduce and map side joins work see [1] whether if it helps you.
Regards Buddhika [1] http://chamibuddhika.wordpress.com/2012/02/26/joins-with-map-reduce/ On Sat, Mar 17, 2012 at 6:41 AM, Alan Gates <[email protected]> wrote: > There are algorithms for doing general theta-joins in parallel. Search > Google on "theta joins parallel database" and you will find some > interesting references. I am not aware of any tools that implement these > yet. You can also do it via a cross join followed by a filter, but again > you need special algorithms to do a cross in MapReduce, which Hive doesn't > implement yet. See > http://ofps.oreilly.com/titles/9781449302641/advanced_pig_latin.html(search > for the section on Cross) for a discussion of how to do cross in > MapReduce. > > Alan. > > On Mar 13, 2012, at 10:13 AM, Tucker, Matt wrote: > > > For theta joins, you’ll have to convert the query to an equi-join, and > then filter for non-equality in the WHERE clause. Depending upon the size > of each table, you might consider looking at map-side joins, which will > allow for doing non-equality filters during a join before it’s passed to > the reducers. > > > > Matt Tucker > > > > From: mahsa mofidpoor [mailto:[email protected]] > > Sent: Tuesday, March 13, 2012 1:02 PM > > To: [email protected] > > Subject: Re: non-equality joins > > > > > > Hi Keith, > > > > Do you know exactly how an algorithm should be in order to fit in the > MapReduce framework? Could you refer me to some references? > > > > Thanks and Regards, > > Mahsa > > > > > > > > On Tue, Mar 13, 2012 at 12:49 PM, Keith Wiley <[email protected]> > wrote: > > https://cwiki.apache.org/Hive/languagemanual-joins.html > > > > "Hive does not support join conditions that are not equality conditions > as it is very difficult to express such conditions as a map/reduce job." > > > > I admit, that isn't a very detailed answer, but it gives some indication > of the reason for the discrepancy between Hive and other databases. Hive > fundamentally operates on Hadoop, namely on MapReduce (we all know this, > I'm just reiterating the train of thought). The problem is that certain > algorithms are exceedingly difficult to wedge into the MapReduce framework. > > > > That is as detailed as my personal insight can get. I've done a lot of > MapReduce programming in Hadoop but I'm not a database expert and I don't > really understand the steps involved in various kinds of table-joins, so I > don't understand the particular ways in which certain database operations > do or do not fit into MapReduce...but presumably nonequality joins > (whatever those are :-D ) are particularly difficult to MapReduceify. > > > > Cheers! > > > > On Mar 13, 2012, at 09:17 , mahsa mofidpoor wrote: > > > > > Hello, > > > > > > Is there a reason behind not implementing non-equality joins in Hive? > In other words, is there any usage for theta-join, if implemented? > > > > > > Thank you in advance for your response, > > > Mahsa > > > > > > > ________________________________________________________________________________ > > Keith Wiley [email protected] keithwiley.com > music.keithwiley.com > > > > "It's a fine line between meticulous and obsessive-compulsive and a > slippery > > rope between obsessive-compulsive and debilitatingly slow." > > -- Keith Wiley > > > ________________________________________________________________________________ > > > > > >
