Hi All, I'm new to Spark and I'd like some help understanding if a particular use case would be a good fit for Spark Streaming.
I have an imaginary stream of sensor data consisting of integers 1-10. Every time the sensor reads "10" I'd like to average all the numbers that were received since the last "10" example input: 10 5 8 4 6 2 1 2 8 8 8 1 6 9 1 3 10 1 3 10 ... desired output: 4.8, 2.0 I'm confused about what happens if sensor readings fall into different RDDs. RDD1: 10 5 8 4 6 2 1 2 8 8 8 RDD2: 1 6 9 1 3 10 1 3 10 output: ???, 2.0 My imaginary sensor doesn't read at fixed time intervals, so breaking the stream into RDDs by time interval won't ensure the data is packaged properly. Additionally, multiple sensors are writing to the same stream (though I think flatMap can parse the origin stream into streams for individual sensors, correct?). My best guess for processing goes like 1) flatMap() to break out individual sensor streams 2) Custom parser to accumulate input data until "10" is found, then create a new output RDD for each sensor and data grouping 3) average the values from step 2 I would greatly appreciate pointers to some specific documentation or examples if you have seen something like this before. Thanks, David -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RDD-boundaries-and-triggering-processing-using-tags-in-the-data-tp23060.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org