Re: Hadoop - Distributed sorting

Prashant Sharma Sat, 26 Nov 2011 05:45:13 -0800

Madhu,

  You can check out sorting code in examples. Actually you dont need to do
anything for sorting. Map-reduce framework does the
(merge-sort)sorting(which happens during shuffle phase before reducer even
starts.) for you, all you need to do is make column you want to sort on as
your key in map.


So for example you have


 protected void map(LongWritable key, Text value, Context context) throws
IOException, InterruptedException {
        String[] tokenArray = value.toString().split(splitter);
          context.write(new Text(tokenArray[field - 1]), value);
    }


And in the reducer you dont need anything either
@Override
    public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
        for (Text val : values) {
            context.write(NullWritable.get(), val);
        }
Thanks,
HTH

On Sat, Nov 26, 2011 at 6:33 PM, madhu_sushmi <madhu_sus...@yahoo.com>wrote:

>
> Hi,
> I need to implement distributed sorting using Hadoop. I am quite new to
> Hadoop and I am getting confused. If I want to implement Merge sort, what
> my
> Map and reduce should be doing. ? Should all the sorting happen at reduce
> side?
>
> Please help. This is an urgent requirement. Please guide me.
>
> Thanks,
> Madhu
> --
> View this message in context:
> http://old.nabble.com/Hadoop---Distributed-sorting-tp32876785p32876785.html
> Sent from the Hadoop core-dev mailing list archive at Nabble.com.
>
>

Re: Hadoop - Distributed sorting

Reply via email to