Re: avoid duplicate due to executor failure in spark stream

2015-08-12 Thread Cody Koeninger
Accumulators aren't going to work to communicate state changes between executors. You need external storage. On Tue, Aug 11, 2015 at 11:28 AM, Shushant Arora wrote: > What if processing is neither idempotent nor its in transaction ,say I am > posting events to some external server after proces

Re: avoid duplicate due to executor failure in spark stream

2015-08-11 Thread Shushant Arora
What if processing is neither idempotent nor its in transaction ,say I am posting events to some external server after processing. Is it possible to get accumulator of failed task in retry task? Is there any way to detect whether this task is retried task or original task ? I was trying to achie

Re: avoid duplicate due to executor failure in spark stream

2015-08-10 Thread Cody Koeninger
http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-direct-approach-no-receivers http://spark.apache.org/docs/latest/streaming-programming-guide.html#semantics-of-output-operations https://www.youtube.com/watch?v=fXnNEq1v3VA On Mon, Aug 10, 2015 at 4:32 PM, Shushant