Hi Bruce, Instead of joining 7 tables in the query, can you please start off with 2 tables and see if that works? If it doesn't, feel free to paste your table definitions and join query along with any properties you are setting and folks on the mailing list can take a jab at it.
Mark ----- Original Message ----- From: "Bruce Bian" <weidong....@gmail.com> To: user@hive.apache.org Sent: Tuesday, May 22, 2012 11:07:38 AM Subject: Condition for doing a sort merge bucket map join Hi , I've got 7 large tables to join(each ~10G in size) into one table, all with the same 2 join keys, I've read some documents on sort merge bucket map join, but failed to fire that. I've bucketed all the 7 tables into 20 buckets and sorted by one of the join key, set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true; set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; Set the above parameters while doing the join. What else do I miss? Do I have to bucket on both of the join keys(I'm currently trying this)? And does each bucket file has to be smaller than one HDFS block? Thanks a lot.