Hello all, I have some questions regarding the foreachRDD output function in Spark Streaming.
The programming guide ( http://spark.apache.org/docs/1.1.0/streaming-programming-guide.html) describes how to output data using network connection on the worker nodes. Are there some working examples on how to do this properly? (Most of the guide just describes what to not do, instead of what to do). Any suggestions on what is the best way to write tests for such code? To make sure that connection objects are used properly etc. How to handle network or other problems on worker node? Can I throw an exception to force spark to try again with that data on another node? As an example: a program that writes data to an sql database using foreachRDD. One worker node might have connection issues to the database, so it has to let another node finish the output operation. Thanks! -- Jesper Lundgren