If you want each key to be combined only once, you can just create a mapping
of keys to a reduced key space. Something like this

val data = sc.parallelize(Array((0,0.030513227), (1,0.11088216),
(2,0.69165534), (3,0.78524816), (4,0.8516909), (5,0.37751913),
(6,0.05674714), (7,0.27523404), (8,0.40828508), (9,0.9491552)))

data.map { case(k,v) => (k / 3, v)}.reduceByKey(_+_)

That code will group keys that are within two of each other and then sum
each group. Could you clarify, if you have the following keys: [141, 142,
143, 144, 145], do you want groups like [(141, 142, 143), (144, 145)] or do
you need groups [(141, 142, 143), (142, 143, 144), (143, 144, 145), (144,
145)]




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Sprk-RDD-want-to-combine-elements-that-have-approx-same-keys-tp24644p24647.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to