Hi, On Fri, Nov 14, 2014 at 3:20 PM, Mayur Rustagi <mayur.rust...@gmail.com> wrote:
> I wonder if SparkConf is dynamically updated on all worker nodes or only > during initialization. It can be used to piggyback information. > Otherwise I guess you are stuck with Broadcast. > Primarily I have had these issues moving legacy MR operators to Spark > where MR piggybacks on Hadoop conf pretty heavily, in spark Native > application its rarely required. Do you have a usecase like that? > My "usecase" is http://apache-spark-user-list.1001560.n3.nabble.com/StreamingContext-does-not-stop-td18826.html – that is, notifying my Spark executors that the StreamingContext has been shut down. (Even with non-graceful shutdown, Spark doesn't seem to end the actual execution, just all the Spark-internal timers etc.) I need to do this properly or processing will go on for a very long time. I have been trying to mis-use broadcast as in - create a class with a boolean var, set to true - query this boolean on the executors as a prerequisite to process the next item - when I want to shutdown, I set the boolean to false and unpersist the broadcast variable (which will trigger re-delivery). This is very dirty, but it works with a "local[*]" master. Unfortunately, when deployed on YARN, the new value will never arrive at my executors. Any idea what could go wrong on YARN with this approach – or what is a "good" way to do this? Thanks Tobias