[jira] [Created] (HIVE-18200) Bucket Map Join : Use correct algorithm to pick the big table

Deepak Jaiswal (JIRA) Fri, 01 Dec 2017 13:36:21 -0800

Deepak Jaiswal created HIVE-18200:
-------------------------------------

             Summary: Bucket Map Join : Use correct algorithm to pick the big 
table
                 Key: HIVE-18200
                 URL: https://issues.apache.org/jira/browse/HIVE-18200
             Project: Hive
          Issue Type: Bug
            Reporter: Deepak Jaiswal
            Assignee: Deepak Jaiswal



Currently the algorithm to pick the big table is flawed due to complexity 
associated with n-way joins.
It could result in OOM, consider the following scenario,

CREATE TABLE tab_part (key int, value string) PARTITIONED BY(ds STRING) 
CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE;
CREATE TABLE tab(key int, value string) PARTITIONED BY(ds STRING) CLUSTERED BY 
(key) INTO 2 BUCKETS STORED AS TEXTFILE;

Lets say tab has size of 2GB and tab_part has size of 500MB and 
noconditionaltasksize is 200MB, then bucket map join should not happen as 
atleast one hash table will be more than 250 MB, which may cause OOM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HIVE-18200) Bucket Map Join : Use correct algorithm to pick the big table

Reply via email to