Hi, I have been looking into using Spark streaming for the specific use case of joining events of data from multiple time-series streams.
The part that I am having a hard time understanding is the consistency semantics of this across multiple streams. As per [1] Section 4.3.4, I understand that Spark has the notion of RDD's (i.e. micro batch time) across multiple streams and they are synchronized. But these batch time’s probably have no relation to the actual event times within that batch. So if I have two streams each with 2 minutes worth of data, I do not see yet how these could be ingested in a synchronized manner into Spark such that spark can maintain alignment of these boundaries. Or put another way, as a producer of these streams for example from Kinesis, I have no notion of batch times. Given that, if I had multiple streams, I do not see how Spark could synchronize these multiple streams. What am I missing? Thanks, Ashwin [1] http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-12.pdf --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org