Re: Spark Streaming with Kafka Use Case

Cody Koeninger Wed, 17 Feb 2016 10:27:09 -0800

Just use a kafka rdd in a batch job or two, then start your streaming job.

On Wed, Feb 17, 2016 at 12:57 AM, Abhishek Anand <abhis.anan...@gmail.com>
wrote:


> I have a spark streaming application running in production. I am trying to
> find a solution for a particular use case when my application has a
> downtime of say 5 hours and is restarted. Now, when I start my streaming
> application after 5 hours there would be considerable amount of data then
> in the Kafka and my cluster would be unable to repartition and process that.
>
> Is there any workaround so that when my streaming application starts it
> starts taking data for 1-2 hours, process it , then take the data for next
> 1 hour process it. Now when its done processing of previous 5 hours data
> which missed, normal streaming should start with the given slide interval.
>
> Please suggest any ideas and feasibility of this.
>
>
> Thanks !!
> Abhi
>

Re: Spark Streaming with Kafka Use Case

Reply via email to