[ https://issues.apache.org/jira/browse/HADOOP-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Allen Wittenauer resolved HADOOP-2921. -------------------------------------- Resolution: Fixed Stale > align map splits on sorted files with key boundaries > ---------------------------------------------------- > > Key: HADOOP-2921 > URL: https://issues.apache.org/jira/browse/HADOOP-2921 > Project: Hadoop Common > Issue Type: New Feature > Affects Versions: 0.16.0 > Reporter: Joydeep Sen Sarma > > (this is something that we have implemented in the application layer - may be > useful to have in hadoop itself). > long term log storage systems often keep data sorted (by some sort-key). > future computations on such files can often benefit from this sort order. if > the job requires grouping by the sort-key - then it should be possible to do > reduction in the map stage itself. > this is not natively supported by hadoop (except in the degenerate case of 1 > map file per task) since splits can span the sort-key. however aligning the > data read by the map task to sort key boundaries is straightforward - and > this would be a useful capability to have in hadoop. > the definition of the sort key should be left up to the application (it's not > necessarily the key field in a Sequencefile) through a generic interface - > but otherwise - the sequencefile and text file readers can use the extracted > sort key to align map task data with key boundaries. -- This message was sent by Atlassian JIRA (v6.2#6252)