Hi Spark community,

I have a design/algorithm question that I assume is common enough for
someone else to have tackled before. I have an RDD of time-series data
formatted as time-value tuples, RDD[(Double, Double)], and am trying to
extract threshold crossings. In order to do so, I first want to transform
the RDD into pairs of time-sequential values.

For example:
Input: The time-series data:
(1, 0.05)
(2, 0.10)
(3, 0.15)
Output: Transformed into time-sequential pairs:
((1, 0.05), (2, 0.10))
((2, 0.10), (3, 0.15))

My initial thought was to try and utilize a custom partitioner. This
partitioner could ensure sequential data was kept together. Then I could
use "mapPartitions" to transform these lists of sequential data. Finally, I
would need some logic for creating sequential pairs across the boundaries
of each partition.

However I was hoping to get some feedback and ideas from the community.
Anyone have thoughts on a simpler solution?

Thanks,
Nick

Reply via email to