Is it possible to achieve serial batching with Spark Streaming?
Example:
I configure the Streaming Context for creating a batch every 3 seconds.
Processing of the batch #2 takes longer than 3 seconds and creates a
backlog of batches:
batch #1 takes 2s
batch #2 takes 10s
batch #3 takes 2s
batch #4 takes 2s
Whet testing locally, it seems that processing of multiple batches is
finished at the same time:
batch #1 finished at 2s
batch #2 finished at 12s
batch #3 finished at 12s (processed in parallel)
batch #4 finished at 15s
How can I delay processing of the next batch, so that is processed
only after processing of the previous batch has been completed?
batch #1 finished at 2s
batch #2 finished at 12s
batch #3 finished at 14s (processed serially)
batch #4 finished at 16s
I want to perform a transformation for every key only once in a given
period of time (e.g. batch duration). I find all unique keys in a
batch and perform the transformation on each key. To ensure that the
transformation is done for every key only once, only one batch can be
processed at a time. At the same time, I want that single batch to be
processed in parallel.
context = new JavaStreamingContext(conf, Durations.seconds(10));
stream = context.receiverStream(...);
stream
.reduceByKey(...)
.transform(...)
.foreachRDD(output);
Any ideas or pointers are very welcome.
Thanks!
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]