[ https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178540#comment-13178540 ]
Mariappan Asokan commented on MAPREDUCE-2454: --------------------------------------------- Attached a patch on top of the trunk. All tests run by maven passed. I would appreciate if a committer can take a look at the patch and help push it into the trunk. In the meantime, I am working on a NullSortPlugin implementation. If there is enough interest, I can add it to the patch later. The idea of a NullSortPlugin is to not sort the map output at all! On the Reduce side, the shuffled records will be passed to the Reducer without any merging. The NullSortPlugin is useful to solve limit-N query problem that does not require order-by(for a complete description of the problem, please refer to HIVE-2004 and MAPREDUCE-1928.) The idea is to stop an MR job that does simple filtering(no sorting needed) after a certain number of records are selected. The steps involved are as follows: * Set the parameters mapred.map.max.attempts and mapred.reduce.max.attempts to 1, set mapred.reduce.slowstart.completed.maps to 0, and set the number of reducers to 1 for an MR job. * Implement a Mapper that does a condition based filtering. * Implement a Reducer which would output only the first N records, print out a diagnostic message in the log and throw an exception to stop the MR job. The first step makes sure that the job is not restarted and the reducer is started right away. The third step limits the number of Map tasks started(which is the desired goal.) The diagnostic message in the log will be useful to find out whether the job aborted or completed normally. > Allow external sorter plugin for MR > ----------------------------------- > > Key: MAPREDUCE-2454 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Reporter: Mariappan Asokan > Priority: Minor > Attachments: HadoopSortPlugin.pdf, KeyValueIterator.java, > MR-2454-trunkPatchPreview.gz, MapOutputSorter.java, > MapOutputSorterAbstract.java, ReduceInputSorter.java, > mr-2454-on-mr-279-build82.patch.gz > > > Define interfaces and some abstract classes in the Hadoop framework to > facilitate external sorter plugins both on the Map and Reduce sides. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira