[ 
https://issues.apache.org/jira/browse/PIG-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042813#comment-14042813
 ] 

Rohini Palaniswamy commented on PIG-3365:
-----------------------------------------

Instead of this approach which goes about setting 
mapreduce.job.ubertask.maxbytes, can we enable uber mode if there is only 1 
input split after pig has combined the splits? And ofcourse we should not set 
it if it is not a file based storage. One doubt though is that whether it will 
take effect when we set it in getInputSplits which is called by the JobClient? 
If it does not need we can go with this approach and use the 
pig.maxCombinedSplitSize instead of default block size to keep it simple and 
neat. 

Few other issues I see
  - Current code is within okToRunLocal and uber mode will never take effect if 
auto.local is not enabled.
  - Another thing is I don't like the idea of getting the input file sizes 
multiple times. That will add lot of stress to namenode. Need to fix that one 
even for auto local mode.

> Run as uber job if there is only one input split
> ------------------------------------------------
>
>                 Key: PIG-3365
>                 URL: https://issues.apache.org/jira/browse/PIG-3365
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Rohini Palaniswamy
>            Assignee: Lorand Bendig
>              Labels: Performance
>         Attachments: PIG-3365.patch
>
>
> Hadoop 2 has support for uber mode (mapreduce.job.ubertask.enable=true) which 
> runs the map and reduce on Application Master itself and reduces the overhead 
> of launching a separate map/reduce task. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to