Re: skew join optimization

Ted Yu Sun, 20 Mar 2011 07:30:58 -0700

Can someone re-attach the missing figures for that wiki ?

Thanks


On Sun, Mar 20, 2011 at 7:15 AM, bharath vissapragada <
bharathvissapragada1...@gmail.com> wrote:

> Hi Igor,
>
> See http://wiki.apache.org/hadoop/Hive/JoinOptimization and see the
> jira 1642 which automatically converts a normal join into map-join
> (Otherwise you can specify the mapjoin hints in the query itself.).
> Because your 'S' table is very small , it can be replicated across all
> the mappers and the reduce phase can be avoided. This can greatly
> reduce the runtime .. (See the results section in the page for
> details.).
>
> Hope this helps.
>
> Thanks
>
>
> On Sun, Mar 20, 2011 at 6:37 PM, Jov <zhao6...@gmail.com> wrote:
> > 2011/3/20 Igor Tatarinov <i...@decide.com>:
> >> I have the following join that takes 4.5 hours (with 12 nodes) mostly
> >> because of a single reduce task that gets the bulk of the work:
> >> SELECT ...
> >> FROM T
> >> LEFT OUTER JOIN S
> >> ON T.timestamp = S.timestamp and T.id = S.id
> >> This is a 1:0/1 join so the size of the output is exactly the same as
> the
> >> size of T (500M records). S is actually very small (5K).
> >> I've tried:
> >> - switching the order of the join conditions
> >> - using a different hash function setting (jenkins instead of murmur)
> >> - using SET set hive.auto.convert.join = true;
> >
> > are you sure your query convert to mapjoin? if not,try use explicit
> > mapjoin hint.
> >
> >
> >> - using SET hive.optimize.skewjoin = true;
> >> but nothing helped :(
> >> Anything else I can try?
> >> Thanks!
> >
>
>
>
> --
> Regards,
> Bharath .V
> w:http://research.iiit.ac.in/~bharath.v
>

Re: skew join optimization

Reply via email to