Hi Jason,
Thanks for the response. I believe I can look into a Redis based solution
for storing this state externally. However, would it be possible to refresh
this from the store with every batch i.e. what code can be written inside
the pipeline to fetch this info from the external store? Also, s
Hi Nikunj,
Depending on what kind of stats you want to accumulate, you may want to
look into the Accumulator/Accumulable API, or if you need more control, you
can store these things in an external key-value store (HBase, redis, etc..)
and do careful updates there. Though be careful and make sure y
Hi all,
I have the following use case that I wanted to get some insight on how to
go about doing in Spark Streaming.
Every batch is processed through the pipeline and at the end, it has to
update some statistics information. This updated info should be reusable in
the next batch of this DStream e