I tried groupByKey and noticed that it did not group all values into the
same group.

In my test dataset (a Pair rdd) I have 16 records, where there are only 4
distinct keys, so I expected there to be 4 records in the groupByKey
object, but instead there were 8. Each of the 4 distinct keys appear 2
times.

Is this the expected behavior? I need to be able to get ALL values
associated with each key grouped into a SINGLE record. Is it possible?

Arun

p.s. reducebykey will not be sufficient for me

Reply via email to