I want to keep track of the events processed in a batch.

How come 'globalCount' work for DStream? I think similar construct won't
work for RDD, that's why there is accumulator.


On Fri, Aug 8, 2014 at 12:52 AM, Tathagata Das <tathagata.das1...@gmail.com>
wrote:

> Do you mean that you want a continuously updated count as more
> events/records are received in the DStream (remember, DStream is a
> continuous stream of data)? Assuming that is what you want, you can use a
> global counter
>
> var globalCount = 0L
>
> dstream.count().foreachRDD(rdd => { globalCount += rdd.first() } )
>
> This globalCount variable will reside in the driver and will keep being
> updated after every batch.
>
> TD
>
>
> On Thu, Aug 7, 2014 at 10:16 PM, Soumitra Kumar <kumar.soumi...@gmail.com>
> wrote:
>
>> Hello,
>>
>> I want to count the number of elements in the DStream, like RDD.count() .
>> Since there is no such method in DStream, I thought of using DStream.count
>> and use the accumulator.
>>
>> How do I do DStream.count() to count the number of elements in a DStream?
>>
>> How do I create a shared variable in Spark Streaming?
>>
>> -Soumitra.
>>
>
>

Reply via email to