Re: how to choose right DStream batch interval

2014-09-10 Thread Tim Smith
http://www.slideshare.net/spark-project/deep-divewithsparkstreaming-tathagatadassparkmeetup20130617 Slide 39 covers it. On Tue, Sep 9, 2014 at 9:23 PM, qihong wrote: > Hi Mayur, > > Thanks for your response. I did write a simple test that set up a DStream > with > 5 batches; The batch duration

Re: how to choose right DStream batch interval

2014-09-09 Thread qihong
Hi Mayur, Thanks for your response. I did write a simple test that set up a DStream with 5 batches; The batch duration is 1 second, and the 3rd batch will take extra 2 seconds, the output of the test shows that the 3rd batch causes backlog, and spark streaming does catch up on 4th and 5th batch (

Re: how to choose right DStream batch interval

2014-09-07 Thread Mayur Rustagi
Spark will simply have a backlog of tasks, it'll manage to process them nonetheless, though if it keeps falling behind, you may run out of memory or have unreasonable latency. For momentary spikes, Spark streaming will manage. Mostly if you are looking to do 100% processing, you'll have to go with

Re: how to choose right DStream batch interval

2014-09-05 Thread qihong
repost since original msg was marked with "This post has NOT been accepted by the mailing list yet." I have some questions regarding DStream batch interval: 1. if it only take 0.5 second to process the batch 99% of time, but 1% of batches need 5 seconds to process (due to some random factor or f