I faced the same situation where two tables with 3 billion records on each side and partitioned, sorted on same key. Set the following parameters in the hive query assuming the join will happen in the map phase.
set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; set hive.optimize.bucketmapjoin=true; set hive.optimize.bucketmapjoin.sortedmerge=true; set hive.enforce.sorting=true; I am using hive version 13 and the storage format is Orc. One of the table is small in size but I haven't checked whether irfan fit in the cache as we have huge memory. But the map sided join didn't happen. What could be the reason? Sent from my iPhone > On Jan 29, 2015, at 7:38 AM, matshyeq <matsh...@gmail.com> wrote: > > I do have two tables partitioned on the same criteria. > Could I still take advantage of Bucket Map Join or better, Sort Merge Bucket > Map Join? > How? > > ~Maciek