[ 
https://issues.apache.org/jira/browse/HIVE-20504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16606919#comment-16606919
 ] 

Zoltan Haindrich commented on HIVE-20504:
-----------------------------------------

[~gopalv] this is not just about bmj; consider the following case:

* 2 tables with roughly the same data size - both fits into memory
* estimated buckets > 1 (enables that logic)
* numLlap nodes came out >=3
* dphj is selected on the basis of network cost

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java#L242

I've made some small measurements for this...and it looked like mj finished 
faster...but my measurement could have been done on a too small dataset...
I'll repeat it with a bigger set.



> Give simple MJ bigger priority than bucketized ones
> ---------------------------------------------------
>
>                 Key: HIVE-20504
>                 URL: https://issues.apache.org/jira/browse/HIVE-20504
>             Project: Hive
>          Issue Type: Improvement
>          Components: Statistics
>            Reporter: Zoltan Haindrich
>            Assignee: Zoltan Haindrich
>            Priority: Major
>         Attachments: HIVE-20504.01.patch, HIVE-20504.01.patch, 
> HIVE-20504.01wip01.patch, HIVE-20504.01wip01.patch
>
>
> from the code it seems "standard" mapjoin is one of the last one tried; in 
> case the table estimated to be bucketed in to 2 - but it's small ; Hive willl 
> do a bucketmapjoin or  dphj...even thru a simple mapjoin could have been an 
> alternative...
> https://github.com/apache/hive/blob/154ca3e3b5eb78cd49a4b3650c750ca731fba7da/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java#L157



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to