There are cases where spark streaming job tasks fails (one, several or all tasks) and there's not much sense in progressing to the next job while discarding the failed one. For example, when failing to connect to remote target DB, I would like to either fail-fast and relaunch the application from the last valid checkpoint when the problem get fixed or do some backoff and retry the same job again until success (or some condition is met).
Any ideas if/how this could be achieved? More specific questions about fail-fast: 1. Is there a way for the driver program to get notified upon a job failure (which tasks have failed / rdd metadata) before updating checkpoint. StreamingContext has a addStreamingListener method but its onBatchCompleted event has no indication on batch failure (only completion). 2. Can job scheduling be affected through public APIs? 3. What's the best way to stop and exit the application. when running in yarn-client mode, calling streamingContext.stop halts the process but does not exit. Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/fail-fast-or-retry-failed-spark-streaming-jobs-tp25607.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org