[ https://issues.apache.org/jira/browse/HIVE-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12971177#action_12971177 ]
Namit Jain commented on HIVE-1695: ---------------------------------- Sorry for the delay on responding on this. @Sreekanth, after https://issues.apache.org/jira/browse/HIVE-1642, we are planning to slowly deprecate/ignore the MAPJOIN hint, and do all the optimizations at runtime. A join followed by group by today will be run as 2 MR jobs, and 1 map-only job followed by 1 MR job if HIVE-1642 decides to convert the join into a regular join. Your approach is certainly more optimal. What is your use case ? Are you concerned about the join followed by groupby where the join key is the same as groupby key ? Or, are you concerned about a a join followed by any operator which leads to a reduce-sink ? As Joy said above, it is very important to carefully tune the memory for the map-join, because the code assumes that there is no memory consuming operations going on. The only exception to this rule so far was HIVE-1830. We should not do any optimizations for map-join, but for general joins which may be converted to joins at runtime. > MapJoin followed by ReduceSink should be done as single MapReduce Job > --------------------------------------------------------------------- > > Key: HIVE-1695 > URL: https://issues.apache.org/jira/browse/HIVE-1695 > Project: Hive > Issue Type: Improvement > Components: Query Processor > Reporter: Amareshwari Sriramadasu > Assignee: Sreekanth Ramakrishnan > Attachments: hive-1695-1.patch, hive-1695.patch > > > Currently MapJoin followed by ReduceSink runs as two MapReduce jobs : One map > only job followed by a Map-Reduce job. It can be combined into single > MapReduce Job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.