Thanks for the response Navis.

I tried the repro again from the beginning, and it doesn't result in hash
table generation. I may have had some setting that enforced map join. The
plan generated shows a conditional stage pointing to a simple map and
reduce stage.

At runtime, however, the query results in a MR job with a reduce stage that
performs the join.

Shouldn't SMB join result in a map only job for a table bucketed and sorted
on join column? Is there size restriction on SMB join (i.e. SMB join kicks
in only if bucket sizes are below some limit?)

Thanks.


On Sun, Aug 3, 2014 at 7:20 PM, Navis류승우 <navis....@nexr.com> wrote:

> I don't think hash table generation is needed for SMB joins. Could you
> check the result of explain extended?
>
> Thanks,
> Navis
>
>
> 2014-07-31 4:08 GMT+09:00 Pala M Muthaia <mchett...@rocketfuelinc.com>:
>
> > +hive-users
> >
> >
> > On Tue, Jul 29, 2014 at 1:56 PM, Pala M Muthaia <
> > mchett...@rocketfuelinc.com
> > > wrote:
> >
> > > Hi,
> > >
> > > I am testing SMB join for 2 large tables. The tables are bucketed and
> > > sorted on the join column. I notice that even though the table is
> large,
> > > Hive attempts to generate hash table for the 'small' table locally,
> > >  similar to map join. Since the table is large in my case, the client
> > runs
> > > out of memory and the query fails.
> > >
> > > I am using Hive 0.12 with the following settings:
> > >
> > > set hive.optimize.bucketmapjoin=true;
> > > set hive.optimize.bucketmapjoin.sortedmerge=true;
> > > set hive.input.format =
> > > org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> > >
> > > My test query does a simple join and a select, no subqueries/nested
> > > queries etc.
> > >
> > > I understand why a (bucket) map join requires hash table generation,
> but
> > > why is that included for an SMB join? Shouldn't a SMB join just spin up
> > one
> > > mapper for each bucket and perform a sort merge join directly on the
> > mapper?
> > >
> > >
> > > Thanks,
> > > pala
> > >
> > >
> > >
> > >
> >
>

Reply via email to