[ https://issues.apache.org/jira/browse/PIG-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042813#comment-14042813 ]
Rohini Palaniswamy commented on PIG-3365: ----------------------------------------- Instead of this approach which goes about setting mapreduce.job.ubertask.maxbytes, can we enable uber mode if there is only 1 input split after pig has combined the splits? And ofcourse we should not set it if it is not a file based storage. One doubt though is that whether it will take effect when we set it in getInputSplits which is called by the JobClient? If it does not need we can go with this approach and use the pig.maxCombinedSplitSize instead of default block size to keep it simple and neat. Few other issues I see - Current code is within okToRunLocal and uber mode will never take effect if auto.local is not enabled. - Another thing is I don't like the idea of getting the input file sizes multiple times. That will add lot of stress to namenode. Need to fix that one even for auto local mode. > Run as uber job if there is only one input split > ------------------------------------------------ > > Key: PIG-3365 > URL: https://issues.apache.org/jira/browse/PIG-3365 > Project: Pig > Issue Type: Improvement > Reporter: Rohini Palaniswamy > Assignee: Lorand Bendig > Labels: Performance > Attachments: PIG-3365.patch > > > Hadoop 2 has support for uber mode (mapreduce.job.ubertask.enable=true) which > runs the map and reduce on Application Master itself and reduces the overhead > of launching a separate map/reduce task. -- This message was sent by Atlassian JIRA (v6.2#6252)