Hi Spark community, I have a design/algorithm question that I assume is common enough for someone else to have tackled before. I have an RDD of time-series data formatted as time-value tuples, RDD[(Double, Double)], and am trying to extract threshold crossings. In order to do so, I first want to transform the RDD into pairs of time-sequential values.
For example: Input: The time-series data: (1, 0.05) (2, 0.10) (3, 0.15) Output: Transformed into time-sequential pairs: ((1, 0.05), (2, 0.10)) ((2, 0.10), (3, 0.15)) My initial thought was to try and utilize a custom partitioner. This partitioner could ensure sequential data was kept together. Then I could use "mapPartitions" to transform these lists of sequential data. Finally, I would need some logic for creating sequential pairs across the boundaries of each partition. However I was hoping to get some feedback and ideas from the community. Anyone have thoughts on a simpler solution? Thanks, Nick