Re: Hadoop - Distributed sorting

Praveen Sripati Mon, 28 Nov 2011 06:45:55 -0800

Check the Yahoo Paper on TeraSort for more details.

http://sortbenchmark.org/YahooHadoop.pdf


Praveen


On Sat, Nov 26, 2011 at 7:14 PM, Prashant Sharma
<[email protected]>wrote:

> Madhu,
>
>  You can check out sorting code in examples. Actually you dont need to do
> anything for sorting. Map-reduce framework does the
> (merge-sort)sorting(which happens during shuffle phase before reducer even
> starts.) for you, all you need to do is make column you want to sort on as
> your key in map.
>
> So for example you have
>
>
>  protected void map(LongWritable key, Text value, Context context) throws
> IOException, InterruptedException {
>        String[] tokenArray = value.toString().split(splitter);
>          context.write(new Text(tokenArray[field - 1]), value);
>    }
>
>
> And in the reducer you dont need anything either
> @Override
>    public void reduce(Text key, Iterable<Text> values, Context context)
> throws IOException, InterruptedException {
>        for (Text val : values) {
>            context.write(NullWritable.get(), val);
>        }
> Thanks,
> HTH
>
> On Sat, Nov 26, 2011 at 6:33 PM, madhu_sushmi <[email protected]
> >wrote:
>
> >
> > Hi,
> > I need to implement distributed sorting using Hadoop. I am quite new to
> > Hadoop and I am getting confused. If I want to implement Merge sort, what
> > my
> > Map and reduce should be doing. ? Should all the sorting happen at reduce
> > side?
> >
> > Please help. This is an urgent requirement. Please guide me.
> >
> > Thanks,
> > Madhu
> > --
> > View this message in context:
> >
> http://old.nabble.com/Hadoop---Distributed-sorting-tp32876785p32876785.html
> > Sent from the Hadoop core-dev mailing list archive at Nabble.com.
> >
> >
>

Re: Hadoop - Distributed sorting

Reply via email to