you should have the bucket columns = join keys = sort columns. When this 
condition is true, I was able to make SMB work.
Even if one of the join keys is a partition (i.e. cannot be part of 
clustering/sorting set), it did not work for me.
So, I'd say just check that all the 7 table joins use the same join keys which 
are all clustered/sorted.
 
Sincerely,


Ameet


________________________________
 From: Bruce Bian <weidong....@gmail.com>
To: user@hive.apache.org 
Sent: Tuesday, May 22, 2012 11:07 AM
Subject: Condition for doing a sort merge bucket map join
 

Hi ,
I've got 7 large tables to join(each ~10G in size) into one table, all with the 
same 2 join keys, I've read some documents on sort merge bucket map join, but 
failed to fire that.
I've bucketed all the 7 tables into 20 buckets and sorted  by one of the join 
key,
set hive.optimize.bucketmapjoin = true;
set hive.optimize.bucketmapjoin.sortedmerge = true;
set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; 
Set the above parameters while doing the join.
What else do I miss? Do I have to bucket on both of the join keys(I'm currently 
trying this)? And does each bucket file has to be smaller than one HDFS block?
Thanks a lot.

Reply via email to