What are the data volume? And what are the meaning of those data? >From what I can see, you have a 'pack' per day. If that's true, a map join could be used because you should not have that many pack creation (But I am not sure how to enforce that.) I so filtering could happen right after. You would indeed generate lots of tuple but they wouldn't be transported over the network nor written to disk.
Even better if you really have (at least) a pack per day then you only need to group each request with three pack creation : the day before, the current day and the day after. Regards Bertrand On Fri, Aug 17, 2012 at 1:27 AM, Navis류승우 <navis....@nexr.com> wrote: > If you don't specify join condition, hive performs cross join. > > What is added to hive 0.10.0 is just a clarifying grammar. > > > 2012/8/17 Himanish Kushary <himan...@gmail.com> > >> We are on Hive 0.8 , I think cross join is available only since 0.10.0 >> >> Do we have any other options ? >> >> On Thu, Aug 16, 2012 at 2:28 PM, Ablimit Aji <abli...@gmail.com> wrote: >> > You can do a CROSS JOIN, then filter with the original inequality join >> > condition. >> > This would generate a lot of redundant tuples and may not work if you >> have >> > large amounts of data. >> > >> > On Thu, Aug 16, 2012 at 2:07 PM, Himanish Kushary <himan...@gmail.com> >> > wrote: >> >> >> >> Hi, >> >> >> >> We have two tables in the following structure : >> >> >> >> Table1 : >> >> >> >> | id | packcreatetime | packid | >> >> ---------------------------------------------------------------------- >> >> | 505 | 2012-07-16 11:51:12 | 111024 | >> >> | 505 | 2012-07-18 11:52:13 | 111025 | >> >> | 505 | 2012-07-19 11:53:14 | 111026 | >> >> | 504 | 2012-07-17 23:50:13 | 101020 | >> >> >> ------------------------------------------------------------------------ >> >> >> >> Table-2 >> >> >> >> | id | requesttime >> >> ---------------------------------------- >> >> | 505 | 2012-07-18 12:09:47 >> >> | 505 | 2012-07-19 12:09:59 >> >> | 505 | 2012-07-19 12:09:56 >> >> | 505 | 2012-07-17 12:06:40 >> >> | 505 | 2012-07-17 12:06:40 >> >> | 505 | 2012-07-17 12:09:15 >> >> | 504 | 2012-07-18 00:03:18 >> >> | 504 | 2012-07-18 00:15:41 >> >> >> >> We want to find out the packid from Table1 where the is corresponding >> >> in Table2 and the requesttime(in Table2) is between the >> >> packcreatetime of two relevant records(in Table1) >> >> >> >> So for the above example the final output will be: >> >> >> >> | id | requesttime | packid >> >> ------------------------------------------------------- >> >> | 505 | 2012-07-18 12:09:47 | 111025 >> >> | 505 | 2012-07-19 12:09:59 | 111026 >> >> | 505 | 2012-07-19 12:09:56 | 111026 >> >> | 505 | 2012-07-17 12:06:40 | 111024 >> >> | 505 | 2012-07-17 12:06:40 | 111024 >> >> | 505 | 2012-07-17 12:09:15 | 111024 >> >> | 504 | 2012-07-18 00:03:18 | 101020 >> >> | 504 | 2012-07-18 00:15:41 | 101020 >> >> >> >> >> >> As we cannot use >= , <= in Hive joins the between logic cannot be >> >> implemented in joins, is there any way to accomplish this or do we >> >> need to write custom M/R code for this.Looking forward for any >> >> suggestions to accomplish this. >> >> >> >> -- >> >> Thanks & Regards >> >> Himanish >> > >> > >> >> >> >> -- >> Thanks & Regards >> Himanish >> > > -- Bertrand Dechoux