http://www.slideshare.net/spark-project/deep-divewithsparkstreaming-tathagatadassparkmeetup20130617
Slide 39 covers it.
On Tue, Sep 9, 2014 at 9:23 PM, qihong wrote:
> Hi Mayur,
>
> Thanks for your response. I did write a simple test that set up a DStream
> with
> 5 batches; The batch duration
Hi Mayur,
Thanks for your response. I did write a simple test that set up a DStream
with
5 batches; The batch duration is 1 second, and the 3rd batch will take extra
2 seconds, the output of the test shows that the 3rd batch causes backlog,
and spark streaming does catch up on 4th and 5th batch (
Spark will simply have a backlog of tasks, it'll manage to process them
nonetheless, though if it keeps falling behind, you may run out of memory
or have unreasonable latency. For momentary spikes, Spark streaming will
manage.
Mostly if you are looking to do 100% processing, you'll have to go with
repost since original msg was marked with "This post has NOT been accepted by
the mailing list yet."
I have some questions regarding DStream batch interval:
1. if it only take 0.5 second to process the batch 99% of time, but 1% of
batches need 5 seconds to process (due to some random factor or f