Check the Yahoo Paper on TeraSort for more details. http://sortbenchmark.org/YahooHadoop.pdf
Praveen On Sat, Nov 26, 2011 at 7:14 PM, Prashant Sharma <prashant.ii...@gmail.com>wrote: > Madhu, > > You can check out sorting code in examples. Actually you dont need to do > anything for sorting. Map-reduce framework does the > (merge-sort)sorting(which happens during shuffle phase before reducer even > starts.) for you, all you need to do is make column you want to sort on as > your key in map. > > So for example you have > > > protected void map(LongWritable key, Text value, Context context) throws > IOException, InterruptedException { > String[] tokenArray = value.toString().split(splitter); > context.write(new Text(tokenArray[field - 1]), value); > } > > > And in the reducer you dont need anything either > @Override > public void reduce(Text key, Iterable<Text> values, Context context) > throws IOException, InterruptedException { > for (Text val : values) { > context.write(NullWritable.get(), val); > } > Thanks, > HTH > > On Sat, Nov 26, 2011 at 6:33 PM, madhu_sushmi <madhu_sus...@yahoo.com > >wrote: > > > > > Hi, > > I need to implement distributed sorting using Hadoop. I am quite new to > > Hadoop and I am getting confused. If I want to implement Merge sort, what > > my > > Map and reduce should be doing. ? Should all the sorting happen at reduce > > side? > > > > Please help. This is an urgent requirement. Please guide me. > > > > Thanks, > > Madhu > > -- > > View this message in context: > > > http://old.nabble.com/Hadoop---Distributed-sorting-tp32876785p32876785.html > > Sent from the Hadoop core-dev mailing list archive at Nabble.com. > > > > >