Re: Ordering of Batches in Spark streaming

2015-07-14 Thread Tathagata Das
This has been discussed in a number of threads in this mailing list. Here is a summary. 1. Processing of batch T+1 always starts after all the processing of batch T has completed. But here a "batch" is defined by data of all the receivers running the in the system receiving within the batch interv

Re: Ordering of Batches in Spark streaming

2015-07-12 Thread anshu shukla
Anyone who can give some highlight over HOW SPARK DOES *ORDERING OF BATCHES * . On Sat, Jul 11, 2015 at 9:19 AM, anshu shukla wrote: > Thanks Ayan , > > I was curious to know* how Spark does it *.Is there any *Documentation* > where i can get the detail about that . Will you please point me

Re: Ordering of Batches in Spark streaming

2015-07-10 Thread anshu shukla
Thanks Ayan , I was curious to know* how Spark does it *.Is there any *Documentation* where i can get the detail about that . Will you please point me out some detailed link etc . May be it does something like *transactional topologies in storm*.( https://storm.apache.org/documentation/Transact

Re: Ordering of Batches in Spark streaming

2015-07-10 Thread ayan guha
AFAIK, it is guranteed that batch t+1 will not start processing until batch t is done. ordeing within batch - what do you mean by that? In essence, the (mini) batch will get distributed in partitions like a normal RDD, so following rdd.zipWithIndex should give a wy to order them by the time they a