This has been discussed in a number of threads in this mailing list. Here
is a summary.
1. Processing of batch T+1 always starts after all the processing of batch
T has completed. But here a "batch" is defined by data of all the receivers
running the in the system receiving within the batch interv
Anyone who can give some highlight over HOW SPARK DOES *ORDERING OF
BATCHES * .
On Sat, Jul 11, 2015 at 9:19 AM, anshu shukla
wrote:
> Thanks Ayan ,
>
> I was curious to know* how Spark does it *.Is there any *Documentation*
> where i can get the detail about that . Will you please point me
Thanks Ayan ,
I was curious to know* how Spark does it *.Is there any *Documentation*
where i can get the detail about that . Will you please point me out some
detailed link etc .
May be it does something like *transactional topologies in storm*.(
https://storm.apache.org/documentation/Transact
AFAIK, it is guranteed that batch t+1 will not start processing until batch
t is done.
ordeing within batch - what do you mean by that? In essence, the (mini)
batch will get distributed in partitions like a normal RDD, so following
rdd.zipWithIndex should give a wy to order them by the time they a