[ https://issues.apache.org/jira/browse/HIVE-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966247#action_12966247 ]
Joydeep Sen Sarma commented on HIVE-1695: ----------------------------------------- couple of things to watch out for: - mapjoin uses a lot of memory on the mapper. i am not sure how the memory setting are controlled - but we need to make sure that the map-join and the sort (imposed by the reducesink) don't blow through the task heap limits. In case the RS is coming because of group by - the map side hash aggregation will also use memory. - the stuff that liyin has been working on converts regular joins into map joins automatically. i believe he generates several plans (map-join and sort-merge join) and chooses from one of them at runtime. will the technique discussed here apply to map-join plans generated by auto-map-joins? (i am not sure - so asking) > MapJoin followed by ReduceSink should be done as single MapReduce Job > --------------------------------------------------------------------- > > Key: HIVE-1695 > URL: https://issues.apache.org/jira/browse/HIVE-1695 > Project: Hive > Issue Type: Improvement > Components: Query Processor > Reporter: Amareshwari Sriramadasu > > Currently MapJoin followed by ReduceSink runs as two MapReduce jobs : One map > only job followed by a Map-Reduce job. It can be combined into single > MapReduce Job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.