[jira] [Commented] (PIG-3365) Run as uber job if there is only one input split

Lorand Bendig (JIRA) Fri, 06 Jun 2014 08:36:40 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14019935#comment-14019935
 ]


Lorand Bendig commented on PIG-3365:
------------------------------------

If {{mapreduce.job.ubertask.enabled}} is set, Hadoop uberizes the job only if 
several conditions are met, like
number of M/R tasks, cpu, memory settings and whether the total input size <= 
{{mapreduce.job.ubertask.maxbytes}} which is set by default to 
{{dfs.blocksize}}.
In those special cases when the user sets max split size (mapred.max.split.size 
or mapreduce.input.fileinputformat.split.maxsize) below dfs.blocksize and this 
results in multiple input splits, Pig won't enable uberization. However, if 
{{mapreduce.job.ubertask.enable}} is set explicitly, Hadoop can uberize the job 
though. This can be confusing.

What if the decision at Pig side is made based on the total input size of the 
job instead? I'd suggest to have an {{opt.ubertask.hint}} property and if it is 
set, and *total input size <= mapreduce.job.ubertask.maxbytes* then Pig will 
sets {{mapreduce.job.ubertask.enable}}. After then Hadoop will have the final 
word on whether uberizing the job or not.

> Run as uber job if there is only one input split
> ------------------------------------------------
>
>                 Key: PIG-3365
>                 URL: https://issues.apache.org/jira/browse/PIG-3365
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Rohini Palaniswamy
>            Assignee: Lorand Bendig
>              Labels: Performance
>
> Hadoop 2 has support for uber mode (mapreduce.job.ubertask.enable=true) which 
> runs the map and reduce on Application Master itself and reduces the overhead 
> of launching a separate map/reduce task. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PIG-3365) Run as uber job if there is only one input split

Reply via email to