I faced the same situation where two tables with 3 billion records on each side 
and partitioned, sorted on same key. Set the following parameters in the hive 
query assuming the join will happen in the map phase. 

set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
set hive.optimize.bucketmapjoin=true;
set hive.optimize.bucketmapjoin.sortedmerge=true;
set hive.enforce.sorting=true;

I am using hive version 13 and the storage format is Orc. One of the table is 
small in size but I haven't checked whether irfan fit in the cache as we have 
huge memory. But the map sided join didn't happen. What could be the reason?

Sent from my iPhone

> On Jan 29, 2015, at 7:38 AM, matshyeq <matsh...@gmail.com> wrote:
> 
> I do have two tables partitioned on the same criteria.
> Could I still take advantage of Bucket Map Join or better, Sort Merge Bucket 
> Map Join?
> How?
> 
> ~Maciek

Reply via email to