You should check out the work being done on. non-equi map joins
http://mail-archives.apache.org/mod_mbox/hive-dev/201206.mbox/%3C1948451998.13482.1339612423225.JavaMail.jiratomcat@issues-vm%3E

https://issues.apache.org/jira/browse/HIVE-3133

On Fri, Aug 17, 2012 at 1:52 AM, Bertrand Dechoux <decho...@gmail.com> wrote:
> What are the data volume? And what are the meaning of those data?
>
> From what I can see, you have a 'pack' per day. If that's true, a map join
> could be used because you should not have that many pack creation (But I am
> not sure how to enforce that.)
> I so filtering could happen right after. You would indeed generate lots of
> tuple but they wouldn't be transported over the network nor written to disk.
>
> Even better if you really have (at least) a pack per day then you only need
> to group each request with three pack creation : the day before, the current
> day and the day after.
>
> Regards
>
> Bertrand
>
> On Fri, Aug 17, 2012 at 1:27 AM, Navis류승우 <navis....@nexr.com> wrote:
>>
>> If you don't specify join condition, hive performs cross join.
>>
>> What is added to hive 0.10.0 is just a clarifying grammar.
>>
>>
>> 2012/8/17 Himanish Kushary <himan...@gmail.com>
>>>
>>> We are on Hive 0.8 , I think cross join is available only since 0.10.0
>>>
>>> Do we have any other options ?
>>>
>>> On Thu, Aug 16, 2012 at 2:28 PM, Ablimit Aji <abli...@gmail.com> wrote:
>>> > You can do a CROSS JOIN, then filter with the original inequality join
>>> > condition.
>>> > This would generate a lot of redundant tuples and may not work if you
>>> > have
>>> > large amounts of data.
>>> >
>>> > On Thu, Aug 16, 2012 at 2:07 PM, Himanish Kushary <himan...@gmail.com>
>>> > wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> We have two tables in the following structure :
>>> >>
>>> >> Table1 :
>>> >>
>>> >> |  id   |        packcreatetime         |   packid |
>>> >> ----------------------------------------------------------------------
>>> >> | 505  |        2012-07-16 11:51:12     | 111024   |
>>> >> | 505  |        2012-07-18 11:52:13     | 111025   |
>>> >> | 505  |        2012-07-19 11:53:14   | 111026   |
>>> >> | 504  |      2012-07-17  23:50:13  |  101020  |
>>> >>
>>> >> ------------------------------------------------------------------------
>>> >>
>>> >> Table-2
>>> >>
>>> >> | id   |   requesttime
>>> >> ----------------------------------------
>>> >> | 505 | 2012-07-18 12:09:47
>>> >> | 505 | 2012-07-19 12:09:59
>>> >> | 505 | 2012-07-19 12:09:56
>>> >> | 505 | 2012-07-17 12:06:40
>>> >> | 505 | 2012-07-17 12:06:40
>>> >> | 505 | 2012-07-17 12:09:15
>>> >> | 504 | 2012-07-18 00:03:18
>>> >> | 504 | 2012-07-18 00:15:41
>>> >>
>>> >> We want to find out the packid from Table1 where the  is corresponding
>>> >> in Table2 and the requesttime(in Table2) is between the
>>> >> packcreatetime of two relevant records(in Table1)
>>> >>
>>> >> So for the above example the final output will be:
>>> >>
>>> >> | id   |   requesttime            |   packid
>>> >> -------------------------------------------------------
>>> >> | 505 | 2012-07-18 12:09:47 |  111025
>>> >> | 505 | 2012-07-19 12:09:59 |  111026
>>> >> | 505 | 2012-07-19 12:09:56 |  111026
>>> >> | 505 | 2012-07-17 12:06:40 |  111024
>>> >> | 505 | 2012-07-17 12:06:40 |  111024
>>> >> | 505 | 2012-07-17 12:09:15 |  111024
>>> >> | 504 | 2012-07-18 00:03:18 |  101020
>>> >> | 504 | 2012-07-18 00:15:41 |  101020
>>> >>
>>> >>
>>> >> As we cannot use >= , <= in Hive joins the between logic cannot be
>>> >> implemented in joins, is there any way to accomplish this or do we
>>> >> need to write custom M/R code for this.Looking forward for any
>>> >> suggestions to accomplish this.
>>> >>
>>> >> --
>>> >> Thanks & Regards
>>> >> Himanish
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Thanks & Regards
>>> Himanish
>>
>>
>
>
>
> --
> Bertrand Dechoux

Reply via email to