[jira] Commented: (HIVE-1695) MapJoin followed by ReduceSink should be done as single MapReduce Job

Joydeep Sen Sarma (JIRA) Thu, 02 Dec 2010 11:50:33 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966247#action_12966247
 ]


Joydeep Sen Sarma commented on HIVE-1695:
-----------------------------------------

couple of things to watch out for:

- mapjoin uses a lot of memory on the mapper. i am not sure how the memory 
setting are controlled - but we need to make sure that the map-join and the 
sort (imposed by the reducesink) don't blow through the task heap limits. In 
case the RS is coming because of group by - the map side hash aggregation will 
also use memory.
- the stuff that liyin has been working on converts regular joins into map 
joins automatically. i believe he generates several plans (map-join and 
sort-merge join) and chooses from one of them at runtime. will the technique 
discussed here apply to map-join plans generated by auto-map-joins? (i am not 
sure - so asking)

> MapJoin followed by ReduceSink should be done as single MapReduce Job
> ---------------------------------------------------------------------
>
>                 Key: HIVE-1695
>                 URL: https://issues.apache.org/jira/browse/HIVE-1695
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>
> Currently MapJoin followed by ReduceSink runs as two MapReduce jobs : One map 
> only job followed by a Map-Reduce job. It can be combined into single 
> MapReduce Job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1695) MapJoin followed by ReduceSink should be done as single MapReduce Job

Reply via email to