Hi there, I am trying to implement a listener that performs as a post-processor which stores data about what was processed or erred. With this, I use an RDD that may or may not change during the course of the application.
My thought was to use onApplicationEnd and then saveToCassandra call to persist this. >From what I've gathered in my experiments, onApplicationEnd doesn't get called until sparkContext.stop() is called. If I don't call stop in my code, the listener won't be called. This works fine on my local tests - stop gets called, the listener is called and then persisted to the db, and everything works fine. However when I run this on our server, the code in onApplicationEnd throws the following exception: Task serialization failed: java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext What's the best way to resolve this? I can think of creating a new SparkContext in the listener (I think I have to turn on allowing multiple contexts, in case I try to create one before the other one is stopped). It seems odd but might be doable. Additionally, what if I were to simply add the code into my job in some sort of procedural block: doJob, doPostProcessing, does that guarantee postProcessing will occur after the other? We are currently using Spark 1.2 standalone at the moment. Please let me know if you require more details. Thanks for the assistance! Sumona