Thanks for sharing here.
Sent from my iPhone5s
> On 2014年3月21日, at 20:44, Sanjay Awatramani wrote:
>
> Hi,
>
> I searched more articles and ran few examples and have clarified my doubts.
> This answer by TD in another thread (
> https://groups.google.com/d/msg/spark-users/GQoxJHAAtX4/0kiRX0n
Hi,
I searched more articles and ran few examples and have clarified my doubts.
This answer by TD in another thread (
https://groups.google.com/d/msg/spark-users/GQoxJHAAtX4/0kiRX0nm1xsJ ) helped
me a lot.
Here is the summary of my finding:
1) A DStream can consist of 0 or 1 or more RDDs.
2) E
Don't see an example, but conceptually it looks like you'll need an
according structure like a Monoid. I mean, because if it's not tied to a
window, it's an overall computation that has to be increased over time
(otherwise it would land in the batch world see after) and that will be the
purpose of
On Thu, Mar 20, 2014 at 11:57 AM, andy petrella wrote:
> also consider creating pairs and use *byKey* operators, and then the key
> will be the structure that will be used to consolidate or deduplicate your
> data
> my2c
>
>
One thing I wonder: imagine I want to sub-divide RDDs in a DStream into
s
also consider creating pairs and use *byKey* operators, and then the key
will be the structure that will be used to consolidate or deduplicate your
data
my2c
On Thu, Mar 20, 2014 at 11:50 AM, Pascal Voitot Dev <
pascal.voitot@gmail.com> wrote:
> Actually it's quite simple...
>
> DStream[T] i
Actually it's quite simple...
DStream[T] is a stream of RDD[T].
So applying count on DStream is just applying count on each RDD of this
DStream.
So at the end of count, you have a DStream[Int] containing the same number
of RDDs as before but each RDD just contains one element being the count
resul
@TD: I do not need multiple RDDs in a DStream in every batch. On the contrary
my logic would work fine if there is only 1 RDD. But then the description for
functions like reduce & count (Return a new DStream of single-element RDDs by
counting the number of elements in each RDD of the source DStr
If I may add my contribution to this discussion if I understand well your
question...
DStream is discretized stream. It discretized the data stream over windows
of time (according to the project code I've read and paper too). so when
you write:
JavaStreamingContext stcObj = new JavaStreamingConte
That is a good question. If I understand correctly, you need multiple RDDs
from a DStream in *every batch*. Can you elaborate on why do you need
multiple RDDs every batch?
TD
On Wed, Mar 19, 2014 at 10:20 PM, Sanjay Awatramani
wrote:
> Hi,
>
> As I understand, a DStream consists of 1 or more RD
Hi,
As I understand, a DStream consists of 1 or more RDDs. And foreachRDD will run
a given func on each and every RDD inside a DStream.
I created a simple program which reads log files from a folder every hour:
JavaStreamingContext stcObj = new JavaStreamingContext(confObj, new Duration(60
* 60
10 matches
Mail list logo