spark rdd grouping

Rajat Kumar Mon, 30 Nov 2015 17:46:41 -0800

Hi

i have a javaPairRdd<K,V> rdd1. i want to group by rdd1 by keys but
preserve the partitions of original rdd only to avoid shuffle since I know
all same keys are already in same partition.


PairRdd is basically constrcuted using kafka streaming low level consumer
which have all records with same key already in same partition. Can i group
them together with avoid shuffle.

Thanks

spark rdd grouping

Reply via email to