I tried groupByKey and noticed that it did not group all values into the same group.
In my test dataset (a Pair rdd) I have 16 records, where there are only 4 distinct keys, so I expected there to be 4 records in the groupByKey object, but instead there were 8. Each of the 4 distinct keys appear 2 times. Is this the expected behavior? I need to be able to get ALL values associated with each key grouped into a SINGLE record. Is it possible? Arun p.s. reducebykey will not be sufficient for me