Hello, We have a Spark streaming application (running Spark 1.6.1) that consumes data from a message queue. The application is running in local[*] mode so driver and executors are in a single JVM.
The issue that we are dealing with these days is that if any of our lambda functions throw any Exception, the entire processing hangs and the task keeps getting retried ad infinitum. According to the documentation here : http://spark.apache.org/docs/1.6.1/configuration.html the default value of spark.task.maxFailures is set to 4 and we have not changed that so I don't quite get why the tasks keep getting retried forever. How can we make sure that a single bad record causing a Runtime Exception does not stop all the processing? Till now I have tried to add try/catch blocks around the lambda functions and returning null in case an exception does get thrown but that causes NullPointerExceptions within Spark with the same net effect of processing of the subsequent batches getting stopped. Thanks in advance, NB