Can you give a bit more information ? Release of Spark you're using Minimal dataset that shows the problem
Cheers On Mon, Jan 4, 2016 at 3:55 PM, Arun Luthra <arun.lut...@gmail.com> wrote: > I tried groupByKey and noticed that it did not group all values into the > same group. > > In my test dataset (a Pair rdd) I have 16 records, where there are only 4 > distinct keys, so I expected there to be 4 records in the groupByKey > object, but instead there were 8. Each of the 4 distinct keys appear 2 > times. > > Is this the expected behavior? I need to be able to get ALL values > associated with each key grouped into a SINGLE record. Is it possible? > > Arun > > p.s. reducebykey will not be sufficient for me >