There are cases where spark streaming job tasks fails (one, several or all
tasks) and there's not much sense in progressing to the next job while
discarding the failed one. For example, when failing to connect to remote
target DB, I would like to either fail-fast and relaunch the application
from the last valid checkpoint when the problem get fixed or do some backoff
and retry the same job again until success (or some condition is met).

Any ideas if/how this could be achieved? 

More specific questions about fail-fast:

1. Is there a way for the driver program to get notified upon a job failure
(which tasks have failed / rdd metadata) before updating checkpoint.
StreamingContext has a addStreamingListener method but its onBatchCompleted
event has no indication on batch failure (only completion). 

2. Can job scheduling be affected through public APIs?

3. What's the best way to stop and exit the application. when running in
yarn-client mode, calling streamingContext.stop halts the process but does
not exit.

Thanks!





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/fail-fast-or-retry-failed-spark-streaming-jobs-tp25607.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to