[ https://issues.apache.org/jira/browse/PIG-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14019935#comment-14019935 ]
Lorand Bendig commented on PIG-3365: ------------------------------------ If {{mapreduce.job.ubertask.enabled}} is set, Hadoop uberizes the job only if several conditions are met, like number of M/R tasks, cpu, memory settings and whether the total input size <= {{mapreduce.job.ubertask.maxbytes}} which is set by default to {{dfs.blocksize}}. In those special cases when the user sets max split size (mapred.max.split.size or mapreduce.input.fileinputformat.split.maxsize) below dfs.blocksize and this results in multiple input splits, Pig won't enable uberization. However, if {{mapreduce.job.ubertask.enable}} is set explicitly, Hadoop can uberize the job though. This can be confusing. What if the decision at Pig side is made based on the total input size of the job instead? I'd suggest to have an {{opt.ubertask.hint}} property and if it is set, and *total input size <= mapreduce.job.ubertask.maxbytes* then Pig will sets {{mapreduce.job.ubertask.enable}}. After then Hadoop will have the final word on whether uberizing the job or not. > Run as uber job if there is only one input split > ------------------------------------------------ > > Key: PIG-3365 > URL: https://issues.apache.org/jira/browse/PIG-3365 > Project: Pig > Issue Type: Improvement > Reporter: Rohini Palaniswamy > Assignee: Lorand Bendig > Labels: Performance > > Hadoop 2 has support for uber mode (mapreduce.job.ubertask.enable=true) which > runs the map and reduce on Application Master itself and reduces the overhead > of launching a separate map/reduce task. -- This message was sent by Atlassian JIRA (v6.2#6252)