[ https://issues.apache.org/jira/browse/HIVE-20504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16606919#comment-16606919 ]
Zoltan Haindrich commented on HIVE-20504: ----------------------------------------- [~gopalv] this is not just about bmj; consider the following case: * 2 tables with roughly the same data size - both fits into memory * estimated buckets > 1 (enables that logic) * numLlap nodes came out >=3 * dphj is selected on the basis of network cost https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java#L242 I've made some small measurements for this...and it looked like mj finished faster...but my measurement could have been done on a too small dataset... I'll repeat it with a bigger set. > Give simple MJ bigger priority than bucketized ones > --------------------------------------------------- > > Key: HIVE-20504 > URL: https://issues.apache.org/jira/browse/HIVE-20504 > Project: Hive > Issue Type: Improvement > Components: Statistics > Reporter: Zoltan Haindrich > Assignee: Zoltan Haindrich > Priority: Major > Attachments: HIVE-20504.01.patch, HIVE-20504.01.patch, > HIVE-20504.01wip01.patch, HIVE-20504.01wip01.patch > > > from the code it seems "standard" mapjoin is one of the last one tried; in > case the table estimated to be bucketed in to 2 - but it's small ; Hive willl > do a bucketmapjoin or dphj...even thru a simple mapjoin could have been an > alternative... > https://github.com/apache/hive/blob/154ca3e3b5eb78cd49a4b3650c750ca731fba7da/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java#L157 -- This message was sent by Atlassian JIRA (v7.6.3#76005)