Re: Condition for doing a sort merge bucket map join

Mark Grover Tue, 22 May 2012 08:43:58 -0700

Hi Bruce,
Instead of joining 7 tables in the query, can you please start off with 2 
tables and see if that works? If it doesn't, feel free to paste your table 
definitions and join query along with any properties you are setting and folks 
on the mailing list can take a jab at it.

Mark

----- Original Message -----
From: "Bruce Bian" <weidong....@gmail.com>
To: user@hive.apache.org
Sent: Tuesday, May 22, 2012 11:07:38 AM
Subject: Condition for doing a sort merge bucket map join

Hi , 
I've got 7 large tables to join(each ~10G in size) into one table, all with the 
same 2 join keys, I've read some documents on sort merge bucket map join, but 
failed to fire that. 
I've bucketed all the 7 tables into 20 buckets and sorted by one of the join 
key, 
set hive.optimize.bucketmapjoin = true; 
set hive.optimize.bucketmapjoin.sortedmerge = true; 
set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; 
Set the above parameters while doing the join. 
What else do I miss? Do I have to bucket on both of the join keys(I'm currently 
trying this)? And does each bucket file has to be smaller than one HDFS block? 
Thanks a lot.

Re: Condition for doing a sort merge bucket map join

Reply via email to