Re: avoid duplicate due to executor failure in spark stream

Cody Koeninger Mon, 10 Aug 2015 14:45:58 -0700

http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-direct-approach-no-receivers


http://spark.apache.org/docs/latest/streaming-programming-guide.html#semantics-of-output-operations

https://www.youtube.com/watch?v=fXnNEq1v3VA


On Mon, Aug 10, 2015 at 4:32 PM, Shushant Arora <shushantaror...@gmail.com>
wrote:

> Hi
>
> How can I avoid duplicate processing of kafka messages in spark stream 1.3
> because of executor failure.
>
> 1.Can I some how access accumulators of failed task in retry  task to skip
> those many events which are already processed by failed task on this
> partition ?
>
> 2.Or I ll have to persist each msg processed and then check before
> processing each msg whether its already processed by failure task and
> delete this perisited information at each batch end?
>

Re: avoid duplicate due to executor failure in spark stream

Reply via email to