[ 
https://issues.apache.org/jira/browse/PIG-5106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-5106:
------------------------------------
    Fix Version/s: 0.19.0
                       (was: 0.18.0)

> Optimize when mapreduce.input.fileinputformat.input.dir.recursive set to true
> -----------------------------------------------------------------------------
>
>                 Key: PIG-5106
>                 URL: https://issues.apache.org/jira/browse/PIG-5106
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Rohini Palaniswamy
>            Assignee: Artem Ervits
>            Priority: Major
>              Labels: newbie
>             Fix For: 0.19.0
>
>         Attachments: PIG-5106-0.patch, PIG-5106-1.patch
>
>
> Many of our classes extending InputFormat have
> {code}
> /*
>      * This is to support multi-level/recursive directory listing until
>      * MAPREDUCE-1577 is fixed.
>      */
>     @Override
>     protected List<FileStatus> listStatus(JobContext job) throws IOException 
> {       
>         return MapRedUtil.getAllFileRecursively(super.listStatus(job),
>                 job.getConfiguration());            
>     }
> {code}
> Now that we have dropped Hadoop 1.x, it can be optimized to 
> {code}
> if (getInputDirRecursive(job)) {
>             return super.listStatus(job);
>         } else {
>             /*
>              *  mapreduce.input.fileinputformat.input.dir.recursive is not 
> true
>              *  by default for backward compatibility reasons.
>              */
>             return MapRedUtil.getAllFileRecursively(super.listStatus(job), 
>                 job.getConfiguration());     
>         }
> {code}
> That would avoid one extra iteration when  
> mapreduce.input.fileinputformat.input.dir.recursive is set to true by users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to