Re: (HIVE-1642) Convert join queries to map-join based on size of table/row

Liyin Tang Mon, 27 Dec 2010 23:57:05 -0800

Yes. Only execute one of the them.

On 27 December 2010 23:43, <ash...@gmail.com> wrote:


> A question about the design doc:
>
> "If one of the tables is large and others are small enough to run Map Join,
> then the Conditional Task will pick the corresponding Map Join Local Task
> to
> run."
> Here you pick one table as big, hash all other tables into memory by join
> key individually. If it works, it will be chosen. Can it be that if one of
> the rest candidate tasks (chosing a different table as big table) performs
> better (assuming), it will not be considered? Did I understand it right?
>

Thanks!
> Alicia
>
> On Mon, Dec 27, 2010 at 12:05 AM, Liyin Tang <liyint...@gmail.com> wrote:
>
> > Hi,
> > If multiple tables join on different join keys, it will be separated into
> > multiple MapRed Tasks.
> > Also the threshold of the small table file size means the sum of all the
> > small table.
> >
> > There is a documentation and a  slide about this feature:
> > http://www.slideshare.net/aiolos127/join-optimization-in-hive
> > <http://www.slideshare.net/aiolos127/join-optimization-in-hive>
> > http://wiki.apache.org/hadoop/Hive/JoinOptimization
> >
> > <http://wiki.apache.org/hadoop/Hive/JoinOptimization>Thanks
> >
> > On 26 December 2010 23:42, <ash...@gmail.com> wrote:
> >
> > > Thanks for the reply. I want to get clarification on this feature.
> > >
> > > If one of the two joining tables table t1 is smaller than 25M and is
> > > sharded, how does this feature work?
> >
> >
> > > Suppose there are joins on multiple tables such as t1, t2 and t3. If t1
> > and
> > > t2 are smaller than 25M  and co-located with joining key, you can
> stream
> > t3
> > > and map join them. What if t1 and t2 are not co-located with join keys?
> > > What
> > > if t1 and t2 together are bigger than 25M?
> > >
> > > Thanks in advance!
> > > On Thu, Dec 23, 2010 at 7:16 PM, Liyin Tang <liyint...@gmail.com>
> wrote:
> > >
> > > > Hi,
> > > > How large is t1 and t2 ?
> > > > if both of t1 and t2 is larger than 25M (a default threshold), the
> > query
> > > > processor will do the common join.
> > > >
> > > > Thanks
> > > > Liyin
> > > >
> > > > On 23 December 2010 18:50, <ash...@gmail.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I set hive.auto.convert.join=true and run the following query:
> > > > >
> > > > > select t1.foo, count(t2.bar) from invites t1 join invites t2 on
> > > > > (t1.foo=t2.foo) group by t1.foo;
> > > > >
> > > > > I did not see it ran as map side join. Did I miss something? Is
> there
> > > any
> > > > > precondition for this feature to work?
> > > > >
> > > > > Thanks.
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards
> > > > -Liyin
> > > >
> > >
> >
> >
> >
> > --
> > Best Regards
> > -Liyin
> >
>



-- 
Best Regards
-Liyin

Re: (HIVE-1642) Convert join queries to map-join based on size of table/row

Reply via email to