The coming data on Spark Streaming
Hi everyone, I'm starting in Spark Streaming and would like to know somethings about data arriving. I know that SS uses micro-batches and they are received by workers and sent to RDD. The master, on defined intervals, receives a poiter to micro-batch in RDD and can use it to process data using mappers and reducers. 1. But, before the master be called, can I work on data? can I do something for each object that arrives on workers when it arrives? 2. The data stream normally is denoted by an ordered sequence of data. But when it arrives in micro-baches, I receive a lot of objects at the same time. How can I determine which order of objects inside batch? Can I extract the timestamp or ordered ID of the arrive for each object? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/The-coming-data-on-Spark-Streaming-tp27720.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: The coming data on Spark Streaming
Anybody? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/The-coming-data-on-Spark-Streaming-tp27720p27771.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Prototype Implementation of Hierarchical Clustering on Spark
Hello, May you tell me how you applied the MapReduce on Bisecting K-Means? I know how the classical BKM works, but how did you parallelize the processing? All leaf nodes are divided at same time? If no, How? If yes, how do you handle the last nodes? Dividing every leaf node by iteration, you always have 2^it nodes. How do you do if k = 7? Thank you -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Prototype-Implementation-of-Hierarchical-Clustering-on-Spark-tp24467p27833.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Microbatches length
Hello folks, I'm using Spark Streaming. My question is simple: The documentation says that microbatches arrive in intervals. The intervals are in real time (minutes, seconds). I want to get microbatches with same length, so, I can configure SS to return microbatches when it reach a determined length? Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Microbatches-length-tp27927.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org