Amir Youssefi created HIVE-4011:
-----------------------------------

             Summary: Sort Merge Join does not kick-in
                 Key: HIVE-4011
                 URL: https://issues.apache.org/jira/browse/HIVE-4011
             Project: Hive
          Issue Type: Bug
          Components: Query Processor
    Affects Versions: 0.10.0, 0.9.0
         Environment: Linux
            Reporter: Amir Youssefi


After required settings to get Sort Merge Join, it does not kick-in and falls 
back to MapJoin with a local first step (on two bucketed and partitioned 
tables).

Ran into the issue on Hive 0.9 at large scale to make sure issue persists I ran 
it on Hive 0.10 with sample public data and regular storage Formats.

More details:

set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
set hive.optimize.bucketmapjoin = true;
set hive.optimize.bucketmapjoin.sortedmerge = true;

select /*+ MAPJOIN(l) */
l.stock_price_open lo,
r.stock_price_open ro
from nyse_stocks_pcsb l JOIN nyse_stocks_pcsb_dup r ON (l.year = r.year and 
l.stock_symbol = r.stock_symbol and l.dte=r.dte)
where ...

DDL:

(both tables)
PARTITIONED BY (year string)
CLUSTERED BY (stock_symbol) SORTED BY (stock_symbol) INTO 4 BUCKETS
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'


also made sure we had:

set hive.enforce.bucketing=true;
set hive.enforce.sorting=true;

Run logs and more info in attached file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to