Hi,

I am a complete newbie to spark and map reduce frameworks and have a basic
question on the reduce function. I was working on the word count example
and was stuck at the reduce stage where the sum happens.

I am trying to understand the working of the reducebykey in Spark using
java as the programming language.

Say if I have a sentence "I am who I am" I break the sentence into words
and store as list [I, am, who, I, am]

now this function assigns 1 to each word

JavaPairRDD ones = words.mapToPair(new PairFunction() { @Override public
Tuple2 call(String s) { return new Tuple2(s, 1); } });

so the output is something like this

(I,1) (am,1) (who,1) (I,1) (am,1)

Now If I have 3 reducers running, each reducer will get a key and the
values associated to that key

reducer 1 : (I,1) (I,1)

reducer 2 : (am,1) (am,1)

reducer 3 : (who,1)

I wanted to know

a. what exactly happens here in the code below.

b.What are the parameters new Function2

c. Basically how the JavaPairRDD is formed.

JavaPairRDD counts = ones.reduceByKey(new Function2() { @Override public
Integer call(Integer i1, Integer i2) { return i1 + i2; } });

thanks !

Reply via email to