Hi all,

        I was trying out some hive optimization features and encountered such 
problem: I cannot use bucket map join in hive 0.12. After all the setting I 
tried below, only one hashtable file is generated and the join turn out to be 
just map join.

        I have two tables both in rcfile format and both bucktized into 10 
bucket, they are created and populated as follows(Origin data was generated 
from TPC-H):

        hive> create table lsm (l_orderkey int, l_partkey int, l_suppkey int, 
l_linenumber int, l_quantity double, l_extendedprice double, l_discount double, 
l_tax double, l_returnflag string, l_linestatus string, l_shipdate string, 
l_commitdate string, l_receiptdate string, l_shipstruct string, l_shipmode 
string, l_comment string) clustered by (l_orderkey) into 10 buckets stored as 
rcfile;
        hive> create table osm (o_orderkey int, o_custkey int) clustered by 
(o_orderkey) into 10 buckets stored as rcfile;
        hive> set hive.enforce.bucketing=true;
        hive> insert overwrite table lsm select * from orili;
        hive> insert overwrite table osm select o_orderkey, o_custkey from 
orior;

        And I can query both table’s data normally, and lsm is 790MB, osm is 
11MB, then I want to try bucket map join:

        hive> set hive.auto.convert.join=true; 
        hive> set hive.optimize.bucketmapjoin=true;
        hive> set hive.enforce.bucketmapjoin=true;
        hive> set hive.auto.convert.join.noconditionaltask=true;
        hive> set 
hive.auto.convert.join.noconditionaltask.size=1000000000000000;
        hive> set 
hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;

        and my query is as follows:

        hive> select /*+ Mapjoin(osm) */ osm.o_orderkey, lsm.* from osm join 
lsm on osm.o_orderkey = lsm.l_orderkey;

    This query only generate 1 hashtable of osm and fall back to a map join, I 
was really confused about it. Do I have all the hint set to enable the bucket 
map join feature, or are there any problems in my query ? 

Thanks,
Henry

        

Reply via email to