I have much the same issue.
While I haven't totally solved it yet, I have found the "window" method useful
for batching up archive blocks - but updateStateByKey is probably what we want
to use, perhaps multiple times. If that works.
My bigger worry now is storage. Unlike non-streaming apps, we
Hi,
I'm wondering whether it's possible to continuously merge the RDDs coming
from a stream into a single RDD efficiently.
One thought is to use the union() method. But using union, I will get a new
RDD each time I do a merge. I don't know how I should name these RDDs,
because I remember Spark do