[ 
https://issues.apache.org/jira/browse/HIVE-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12971177#action_12971177
 ] 

Namit Jain commented on HIVE-1695:
----------------------------------

Sorry for the delay on responding on this.

@Sreekanth, after https://issues.apache.org/jira/browse/HIVE-1642, we are 
planning to slowly deprecate/ignore the MAPJOIN
hint, and do all the optimizations at runtime.

A join followed by group by today will be run as 2 MR jobs, and 1 map-only job 
followed by 1 MR job if
HIVE-1642 decides to convert the join into a regular join.
Your approach is certainly more optimal.

What is your use case ? Are you concerned about the join followed by groupby 
where the join key is the same as groupby key ?
Or, are you concerned about a a join followed by any operator which leads to a 
reduce-sink ?

As Joy said above, it is very important to carefully tune the memory for the 
map-join, because the code assumes that there
is no memory consuming operations going on. The only exception to this rule so 
far was HIVE-1830.

We should not do any optimizations for map-join, but for general joins which 
may be converted to joins at runtime.

> MapJoin followed by ReduceSink should be done as single MapReduce Job
> ---------------------------------------------------------------------
>
>                 Key: HIVE-1695
>                 URL: https://issues.apache.org/jira/browse/HIVE-1695
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Sreekanth Ramakrishnan
>         Attachments: hive-1695-1.patch, hive-1695.patch
>
>
> Currently MapJoin followed by ReduceSink runs as two MapReduce jobs : One map 
> only job followed by a Map-Reduce job. It can be combined into single 
> MapReduce Job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to