you should have the bucket columns = join keys = sort columns. When this condition is true, I was able to make SMB work. Even if one of the join keys is a partition (i.e. cannot be part of clustering/sorting set), it did not work for me. So, I'd say just check that all the 7 table joins use the same join keys which are all clustered/sorted. Sincerely,
Ameet ________________________________ From: Bruce Bian <weidong....@gmail.com> To: user@hive.apache.org Sent: Tuesday, May 22, 2012 11:07 AM Subject: Condition for doing a sort merge bucket map join Hi , I've got 7 large tables to join(each ~10G in size) into one table, all with the same 2 join keys, I've read some documents on sort merge bucket map join, but failed to fire that. I've bucketed all the 7 tables into 20 buckets and sorted by one of the join key, set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true; set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; Set the above parameters while doing the join. What else do I miss? Do I have to bucket on both of the join keys(I'm currently trying this)? And does each bucket file has to be smaller than one HDFS block? Thanks a lot.