SparkListener onApplicationEnd processing an RDD throws exception because of stopped SparkContext

Sumona Routh Mon, 15 Feb 2016 08:00:06 -0800

Hi there,
I am trying to implement a listener that performs as a post-processor which
stores data about what was processed or erred. With this, I use an RDD that
may or may not change during the course of the application.


My thought was to use onApplicationEnd and then saveToCassandra call to
persist this.

>From what I've gathered in my experiments,  onApplicationEnd  doesn't get
called until sparkContext.stop() is called. If I don't call stop in my
code, the listener won't be called. This works fine on my local tests -
stop gets called, the listener is called and then persisted to the db, and
everything works fine. However when I run this on our server,  the code in
onApplicationEnd throws the following exception:

Task serialization failed: java.lang.IllegalStateException: Cannot call
methods on a stopped SparkContext

What's the best way to resolve this? I can think of creating a new
SparkContext in the listener (I think I have to turn on allowing multiple
contexts, in case I try to create one before the other one is stopped). It
seems odd but might be doable. Additionally, what if I were to simply add
the code into my job in some sort of procedural block: doJob,
doPostProcessing, does that guarantee postProcessing will occur after the
other?

We are currently using Spark 1.2 standalone at the moment.

Please let me know if you require more details. Thanks for the assistance!
Sumona

SparkListener onApplicationEnd processing an RDD throws exception because of stopped SparkContext

Reply via email to