If you want each key to be combined only once, you can just create a mapping of keys to a reduced key space. Something like this
val data = sc.parallelize(Array((0,0.030513227), (1,0.11088216), (2,0.69165534), (3,0.78524816), (4,0.8516909), (5,0.37751913), (6,0.05674714), (7,0.27523404), (8,0.40828508), (9,0.9491552))) data.map { case(k,v) => (k / 3, v)}.reduceByKey(_+_) That code will group keys that are within two of each other and then sum each group. Could you clarify, if you have the following keys: [141, 142, 143, 144, 145], do you want groups like [(141, 142, 143), (144, 145)] or do you need groups [(141, 142, 143), (142, 143, 144), (143, 144, 145), (144, 145)] -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Sprk-RDD-want-to-combine-elements-that-have-approx-same-keys-tp24644p24647.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org