from:"pcandido"

The coming data on Spark Streaming

2016-09-14 Thread pcandido

Hi everyone,

I'm starting in Spark Streaming and would like to know somethings about data
arriving.

I know that SS uses micro-batches and they are received by workers and sent
to RDD. The master, on defined intervals, receives a poiter to micro-batch
in RDD and can use it to process data using mappers and reducers.

1. But, before the master be called, can I work on data? can I do something
for each object that arrives on workers when it arrives?

2. The data stream normally is denoted by an ordered sequence of data. But
when it arrives in micro-baches, I receive a lot of objects at the same
time. How can I determine which order of objects inside batch? Can I extract
the timestamp or ordered ID of the arrive for each object?

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/The-coming-data-on-Spark-Streaming-tp27720.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: The coming data on Spark Streaming

2016-09-21 Thread pcandido

Anybody?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/The-coming-data-on-Spark-Streaming-tp27720p27771.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Prototype Implementation of Hierarchical Clustering on Spark

2016-10-03 Thread pcandido

Hello,
May you tell me how you applied the MapReduce on Bisecting K-Means?
I know how the classical BKM works, but how did you parallelize the
processing?
All leaf nodes are divided at same time? If no, How?
If yes, how do you handle the last nodes? Dividing every leaf node by
iteration, you always have 2^it nodes. How do you do if k = 7?

Thank you



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Prototype-Implementation-of-Hierarchical-Clustering-on-Spark-tp24467p27833.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Microbatches length

2016-10-20 Thread pcandido

Hello folks,

I'm using Spark Streaming. My question is simple:
The documentation says that microbatches arrive in intervals. The intervals
are in real time (minutes, seconds). I want to get microbatches with same
length, so, I can configure SS to return microbatches when it reach a
determined length?

Thanks.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Microbatches-length-tp27927.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

The coming data on Spark Streaming

Re: The coming data on Spark Streaming

Re: Prototype Implementation of Hierarchical Clustering on Spark

Microbatches length

4 matches

Site Navigation

Mail list logo

Footer information