The coming data on Spark Streaming

2016-09-14 Thread pcandido
Hi everyone,

I'm starting in Spark Streaming and would like to know somethings about data
arriving.

I know that SS uses micro-batches and they are received by workers and sent
to RDD. The master, on defined intervals, receives a poiter to micro-batch
in RDD and can use it to process data using mappers and reducers.

1. But, before the master be called, can I work on data? can I do something
for each object that arrives on workers when it arrives?

2. The data stream normally is denoted by an ordered sequence of data. But
when it arrives in micro-baches, I receive a lot of objects at the same
time. How can I determine which order of objects inside batch? Can I extract
the timestamp or ordered ID of the arrive for each object?

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/The-coming-data-on-Spark-Streaming-tp27720.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: The coming data on Spark Streaming

2016-09-21 Thread pcandido
Anybody?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/The-coming-data-on-Spark-Streaming-tp27720p27771.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Prototype Implementation of Hierarchical Clustering on Spark

2016-10-03 Thread pcandido
Hello,
May you tell me how you applied the MapReduce on Bisecting K-Means?
I know how the classical BKM works, but how did you parallelize the
processing?
All leaf nodes are divided at same time? If no, How?
If yes, how do you handle the last nodes? Dividing every leaf node by
iteration, you always have 2^it nodes. How do you do if k = 7?

Thank you



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Prototype-Implementation-of-Hierarchical-Clustering-on-Spark-tp24467p27833.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Microbatches length

2016-10-20 Thread pcandido
Hello folks,

I'm using Spark Streaming. My question is simple:
The documentation says that microbatches arrive in intervals. The intervals
are in real time (minutes, seconds). I want to get microbatches with same
length, so, I can configure SS to return microbatches when it reach a
determined length?

Thanks.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Microbatches-length-tp27927.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org