You should check out the work being done on. non-equi map joins http://mail-archives.apache.org/mod_mbox/hive-dev/201206.mbox/%3C1948451998.13482.1339612423225.JavaMail.jiratomcat@issues-vm%3E
https://issues.apache.org/jira/browse/HIVE-3133 On Fri, Aug 17, 2012 at 1:52 AM, Bertrand Dechoux <decho...@gmail.com> wrote: > What are the data volume? And what are the meaning of those data? > > From what I can see, you have a 'pack' per day. If that's true, a map join > could be used because you should not have that many pack creation (But I am > not sure how to enforce that.) > I so filtering could happen right after. You would indeed generate lots of > tuple but they wouldn't be transported over the network nor written to disk. > > Even better if you really have (at least) a pack per day then you only need > to group each request with three pack creation : the day before, the current > day and the day after. > > Regards > > Bertrand > > On Fri, Aug 17, 2012 at 1:27 AM, Navis류승우 <navis....@nexr.com> wrote: >> >> If you don't specify join condition, hive performs cross join. >> >> What is added to hive 0.10.0 is just a clarifying grammar. >> >> >> 2012/8/17 Himanish Kushary <himan...@gmail.com> >>> >>> We are on Hive 0.8 , I think cross join is available only since 0.10.0 >>> >>> Do we have any other options ? >>> >>> On Thu, Aug 16, 2012 at 2:28 PM, Ablimit Aji <abli...@gmail.com> wrote: >>> > You can do a CROSS JOIN, then filter with the original inequality join >>> > condition. >>> > This would generate a lot of redundant tuples and may not work if you >>> > have >>> > large amounts of data. >>> > >>> > On Thu, Aug 16, 2012 at 2:07 PM, Himanish Kushary <himan...@gmail.com> >>> > wrote: >>> >> >>> >> Hi, >>> >> >>> >> We have two tables in the following structure : >>> >> >>> >> Table1 : >>> >> >>> >> | id | packcreatetime | packid | >>> >> ---------------------------------------------------------------------- >>> >> | 505 | 2012-07-16 11:51:12 | 111024 | >>> >> | 505 | 2012-07-18 11:52:13 | 111025 | >>> >> | 505 | 2012-07-19 11:53:14 | 111026 | >>> >> | 504 | 2012-07-17 23:50:13 | 101020 | >>> >> >>> >> ------------------------------------------------------------------------ >>> >> >>> >> Table-2 >>> >> >>> >> | id | requesttime >>> >> ---------------------------------------- >>> >> | 505 | 2012-07-18 12:09:47 >>> >> | 505 | 2012-07-19 12:09:59 >>> >> | 505 | 2012-07-19 12:09:56 >>> >> | 505 | 2012-07-17 12:06:40 >>> >> | 505 | 2012-07-17 12:06:40 >>> >> | 505 | 2012-07-17 12:09:15 >>> >> | 504 | 2012-07-18 00:03:18 >>> >> | 504 | 2012-07-18 00:15:41 >>> >> >>> >> We want to find out the packid from Table1 where the is corresponding >>> >> in Table2 and the requesttime(in Table2) is between the >>> >> packcreatetime of two relevant records(in Table1) >>> >> >>> >> So for the above example the final output will be: >>> >> >>> >> | id | requesttime | packid >>> >> ------------------------------------------------------- >>> >> | 505 | 2012-07-18 12:09:47 | 111025 >>> >> | 505 | 2012-07-19 12:09:59 | 111026 >>> >> | 505 | 2012-07-19 12:09:56 | 111026 >>> >> | 505 | 2012-07-17 12:06:40 | 111024 >>> >> | 505 | 2012-07-17 12:06:40 | 111024 >>> >> | 505 | 2012-07-17 12:09:15 | 111024 >>> >> | 504 | 2012-07-18 00:03:18 | 101020 >>> >> | 504 | 2012-07-18 00:15:41 | 101020 >>> >> >>> >> >>> >> As we cannot use >= , <= in Hive joins the between logic cannot be >>> >> implemented in joins, is there any way to accomplish this or do we >>> >> need to write custom M/R code for this.Looking forward for any >>> >> suggestions to accomplish this. >>> >> >>> >> -- >>> >> Thanks & Regards >>> >> Himanish >>> > >>> > >>> >>> >>> >>> -- >>> Thanks & Regards >>> Himanish >> >> > > > > -- > Bertrand Dechoux