Re: skew join optimization

Edward Capriolo Sun, 20 Mar 2011 07:56:53 -0700

On Sun, Mar 20, 2011 at 10:30 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> Can someone re-attach the missing figures for that wiki ?
>
> Thanks
>
> On Sun, Mar 20, 2011 at 7:15 AM, bharath vissapragada
> <bharathvissapragada1...@gmail.com> wrote:
>>
>> Hi Igor,
>>
>> See http://wiki.apache.org/hadoop/Hive/JoinOptimization and see the
>> jira 1642 which automatically converts a normal join into map-join
>> (Otherwise you can specify the mapjoin hints in the query itself.).
>> Because your 'S' table is very small , it can be replicated across all
>> the mappers and the reduce phase can be avoided. This can greatly
>> reduce the runtime .. (See the results section in the page for
>> details.).
>>
>> Hope this helps.
>>
>> Thanks
>>
>>
>> On Sun, Mar 20, 2011 at 6:37 PM, Jov <zhao6...@gmail.com> wrote:
>> > 2011/3/20 Igor Tatarinov <i...@decide.com>:
>> >> I have the following join that takes 4.5 hours (with 12 nodes) mostly
>> >> because of a single reduce task that gets the bulk of the work:
>> >> SELECT ...
>> >> FROM T
>> >> LEFT OUTER JOIN S
>> >> ON T.timestamp = S.timestamp and T.id = S.id
>> >> This is a 1:0/1 join so the size of the output is exactly the same as
>> >> the
>> >> size of T (500M records). S is actually very small (5K).
>> >> I've tried:
>> >> - switching the order of the join conditions
>> >> - using a different hash function setting (jenkins instead of murmur)
>> >> - using SET set hive.auto.convert.join = true;
>> >
>> > are you sure your query convert to mapjoin? if not,try use explicit
>> > mapjoin hint.
>> >
>> >
>> >> - using SET hive.optimize.skewjoin = true;
>> >> but nothing helped :(
>> >> Anything else I can try?
>> >> Thanks!
>> >
>>
>>
>>
>> --
>> Regards,
>> Bharath .V
>> w:http://research.iiit.ac.in/~bharath.v
>
>


The wiki does not allow images, confluence does but we have not moved their yet.

Re: skew join optimization

Reply via email to