[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178540#comment-13178540
 ] 

Mariappan Asokan commented on MAPREDUCE-2454:
---------------------------------------------

Attached a patch on top of the trunk.  All tests run by maven passed.

I would appreciate if a committer can take a look at the patch and help push it
into the trunk.

In the meantime, I am working on a NullSortPlugin implementation.  If there is
enough interest, I can add it to the patch later.  The idea of a NullSortPlugin
is to not sort the map output at all!  On the Reduce side, the shuffled records
will be passed to the Reducer without any merging.

The NullSortPlugin is useful to solve limit-N query problem that does not
require order-by(for a complete description of the problem, please refer to
HIVE-2004 and MAPREDUCE-1928.)  The idea is to stop an MR job that does simple
filtering(no sorting needed) after a certain number of records are selected.

The steps involved are as follows:

* Set the parameters mapred.map.max.attempts and mapred.reduce.max.attempts to
1, set mapred.reduce.slowstart.completed.maps to 0, and set the number of
reducers to 1 for an MR job.

* Implement a Mapper that does a condition based filtering.

* Implement a Reducer which would output only the first N records, print out a
diagnostic message in the log and throw an exception to stop the MR job.

The first step makes sure that the job is not restarted and the reducer is
started right away.  The third step limits the number of Map tasks started(which
is the desired goal.)  The diagnostic message in the log will be useful to find
out whether the job aborted or completed normally.
                
> Allow external sorter plugin for MR
> -----------------------------------
>
>                 Key: MAPREDUCE-2454
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Mariappan Asokan
>            Priority: Minor
>         Attachments: HadoopSortPlugin.pdf, KeyValueIterator.java, 
> MR-2454-trunkPatchPreview.gz, MapOutputSorter.java, 
> MapOutputSorterAbstract.java, ReduceInputSorter.java, 
> mr-2454-on-mr-279-build82.patch.gz
>
>
> Define interfaces and some abstract classes in the Hadoop framework to 
> facilitate external sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to