http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-direct-approach-no-receivers
http://spark.apache.org/docs/latest/streaming-programming-guide.html#semantics-of-output-operations https://www.youtube.com/watch?v=fXnNEq1v3VA On Mon, Aug 10, 2015 at 4:32 PM, Shushant Arora <shushantaror...@gmail.com> wrote: > Hi > > How can I avoid duplicate processing of kafka messages in spark stream 1.3 > because of executor failure. > > 1.Can I some how access accumulators of failed task in retry task to skip > those many events which are already processed by failed task on this > partition ? > > 2.Or I ll have to persist each msg processed and then check before > processing each msg whether its already processed by failure task and > delete this perisited information at each batch end? >