How about ...
val data = sc.parallelize(Array((1,0.05),(2,0.10),(3,0.15)))
val pairs = data.join(data.map(t => (t._1 + 1, t._2)))
It's a self-join, but one copy has its ID incremented by 1. I don't
know if it's performant but works, although output is more like:
(2,(0.1,0.05))
(3,(0.15,0.1))
On
Hi Spark community,
I have a design/algorithm question that I assume is common enough for
someone else to have tackled before. I have an RDD of time-series data
formatted as time-value tuples, RDD[(Double, Double)], and am trying to
extract threshold crossings. In order to do so, I first want to t