I want to keep track of the events processed in a batch. How come 'globalCount' work for DStream? I think similar construct won't work for RDD, that's why there is accumulator.
On Fri, Aug 8, 2014 at 12:52 AM, Tathagata Das <tathagata.das1...@gmail.com> wrote: > Do you mean that you want a continuously updated count as more > events/records are received in the DStream (remember, DStream is a > continuous stream of data)? Assuming that is what you want, you can use a > global counter > > var globalCount = 0L > > dstream.count().foreachRDD(rdd => { globalCount += rdd.first() } ) > > This globalCount variable will reside in the driver and will keep being > updated after every batch. > > TD > > > On Thu, Aug 7, 2014 at 10:16 PM, Soumitra Kumar <kumar.soumi...@gmail.com> > wrote: > >> Hello, >> >> I want to count the number of elements in the DStream, like RDD.count() . >> Since there is no such method in DStream, I thought of using DStream.count >> and use the accumulator. >> >> How do I do DStream.count() to count the number of elements in a DStream? >> >> How do I create a shared variable in Spark Streaming? >> >> -Soumitra. >> > >