Deepak Jaiswal created HIVE-18200: ------------------------------------- Summary: Bucket Map Join : Use correct algorithm to pick the big table Key: HIVE-18200 URL: https://issues.apache.org/jira/browse/HIVE-18200 Project: Hive Issue Type: Bug Reporter: Deepak Jaiswal Assignee: Deepak Jaiswal
Currently the algorithm to pick the big table is flawed due to complexity associated with n-way joins. It could result in OOM, consider the following scenario, CREATE TABLE tab_part (key int, value string) PARTITIONED BY(ds STRING) CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE; CREATE TABLE tab(key int, value string) PARTITIONED BY(ds STRING) CLUSTERED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE; Lets say tab has size of 2GB and tab_part has size of 500MB and noconditionaltasksize is 200MB, then bucket map join should not happen as atleast one hash table will be more than 250 MB, which may cause OOM. -- This message was sent by Atlassian JIRA (v6.4.14#64029)