[ https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13575336#comment-13575336 ]
Ashutosh Chauhan commented on HIVE-3403: ---------------------------------------- Thinking more about my point a) above, there are three potential join optimization opportunities: a) Convert a JoinOperator to non-bucketed MapJoinOperator. b) Convert a JoinOperator to bucketed MapJoinOpperator. c) Convert a JoinOperator to sort-merge-bucketed MapJoinOperator. Among these c) doesn't need to buffer data in memory, so can be determined completely at compile time, which this patch enables. a) and b) buffers data in memory so need to be done at run time. a) is already taken care of in HIVE-3784. So, we are left with b) now. With this patch, we will convert a Join Operator to bucketed MapJoin Operator at compile time by attempting to convert a map-join operator (which will be there because user provided the hint). But ideally this should also be done at runtime just like a). At run-time we should see first if tables are bucketed than check if the size of required buckets of smaller table can fit in memory and if they do than convert a JoinOperator to BMJ. If table is not bucketed than check size of whole of small table and than convert it into non-bucketed map-join. If we do this than we can completely get rid of map-join hints. If we get there, that will be advantageous to users since they never have to provide hints in their queries, hive optimizer will generate most optimal plan possible. It will be advantageous to hive devs since they will never have to bother about map-join operators in query compilation phase because map-join operator will never be part of plan at compile time. It will only appear at run-time if Join Operator is optimized to MapJoin Operator. This will simplify semantic analysis, plan generation and compile time optimizations a lot. Namit, is this analysis correct? > user should not specify mapjoin to perform sort-merge bucketed join > ------------------------------------------------------------------- > > Key: HIVE-3403 > URL: https://issues.apache.org/jira/browse/HIVE-3403 > Project: Hive > Issue Type: Bug > Reporter: Namit Jain > Assignee: Namit Jain > Attachments: hive.3403.10.patch, hive.3403.11.patch, > hive.3403.12.patch, hive.3403.13.patch, hive.3403.14.patch, > hive.3403.15.patch, hive.3403.16.patch, hive.3403.17.patch, > hive.3403.18.patch, hive.3403.19.patch, hive.3403.1.patch, > hive.3403.21.patch, hive.3403.22.patch, hive.3403.23.patch, > hive.3403.24.patch, hive.3403.25.patch, hive.3403.26.patch, > hive.3403.2.patch, hive.3403.3.patch, hive.3403.4.patch, hive.3403.5.patch, > hive.3403.6.patch, hive.3403.7.patch, hive.3403.8.patch, hive.3403.9.patch > > > Currently, in order to perform a sort merge bucketed join, the user needs > to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the > mapjoin hint. > The user should not specify any hints. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira