I think as long as you have adequate monitoring and Kafka retention, the
simplest solution is safest - let it crash.
 On May 14, 2015 4:00 PM, "badgerpants" <mark.stew...@tapjoy.com> wrote:

> We've been using the new DirectKafkaInputDStream to implement an exactly
> once
> processing solution that tracks the provided offset ranges within the same
> transaction that persists our data results. When an exception is thrown
> within the processing loop and the configured number of retries are
> exhausted the stream will skip to the end of the failed range of offsets
> and
> continue on with the next  RDD.
>
> Makes sense but we're wondering how others would handle recovering from
> failures. In our case the cause of the exception was a temporary outage of
> a
> needed service. Since the transaction rolled back at the point of failure
> our offset tracking table retained the correct offsets updated so we simply
> needed to restart the Spark process whereupon it happily picked up at the
> correct point and continued. Short of the restart do people have any good
> ideas for how we might recover?
>
> FWIW We've looked at setting spark.task.maxFailures param to a large value
> and looked for a property that would increase the wait between attempts.
> This might mitigate the issue when the availability problem is short lived
> but wouldn't completely eliminate the need to restart.
>
> Any thoughts, ideas welcome.
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Error-recovery-strategies-using-the-DirectKafkaInputDStream-tp12258.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Reply via email to