[ 
https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13575336#comment-13575336
 ] 

Ashutosh Chauhan commented on HIVE-3403:
----------------------------------------

Thinking more about my point a) above, there are three potential join 
optimization opportunities:
a) Convert a JoinOperator to non-bucketed MapJoinOperator.
b) Convert a JoinOperator to bucketed MapJoinOpperator.
c) Convert a JoinOperator to sort-merge-bucketed MapJoinOperator.
 
Among these c) doesn't need to buffer data in memory, so can be determined 
completely at compile time, which this patch enables. a) and b) buffers data in 
memory so need to be done at run time. a) is already taken care of in 
HIVE-3784. 
So, we are left with b) now. With this patch, we will convert a Join Operator 
to bucketed MapJoin Operator at compile time by attempting to convert a 
map-join operator (which will be there because user provided the hint). But 
ideally this should also be done at runtime just like a). At run-time we should 
see first if tables are bucketed than check if the size of required buckets of 
smaller table can fit in memory and if they do than convert a JoinOperator to 
BMJ. If table is not bucketed than check size of whole of small table and than 
convert it into non-bucketed map-join. If we do this than we can completely get 
rid of map-join hints. If we get there, that will be advantageous to users 
since they never have to provide hints in their queries, hive optimizer will 
generate most optimal plan possible. It will be advantageous to hive devs since 
they will never have to bother about map-join operators in query compilation 
phase because map-join operator will never be part of plan at compile time. It 
will only appear at run-time if Join Operator is optimized to MapJoin Operator. 
This will simplify semantic analysis, plan generation and compile time 
optimizations a lot.
Namit, is this analysis correct? 

                
> user should not specify mapjoin to perform sort-merge bucketed join
> -------------------------------------------------------------------
>
>                 Key: HIVE-3403
>                 URL: https://issues.apache.org/jira/browse/HIVE-3403
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: hive.3403.10.patch, hive.3403.11.patch, 
> hive.3403.12.patch, hive.3403.13.patch, hive.3403.14.patch, 
> hive.3403.15.patch, hive.3403.16.patch, hive.3403.17.patch, 
> hive.3403.18.patch, hive.3403.19.patch, hive.3403.1.patch, 
> hive.3403.21.patch, hive.3403.22.patch, hive.3403.23.patch, 
> hive.3403.24.patch, hive.3403.25.patch, hive.3403.26.patch, 
> hive.3403.2.patch, hive.3403.3.patch, hive.3403.4.patch, hive.3403.5.patch, 
> hive.3403.6.patch, hive.3403.7.patch, hive.3403.8.patch, hive.3403.9.patch
>
>
> Currently, in order to perform a sort merge bucketed join, the user needs
> to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the 
> mapjoin hint.
> The user should not specify any hints.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to