Re: SparkListener onApplicationEnd processing an RDD throws exception because of stopped SparkContext

Sumona Routh Mon, 22 Feb 2016 07:24:47 -0800

Ok, I understand.

Yes, I will have to handle them in the main thread.


Thanks!
Sumona



On Wed, Feb 17, 2016 at 12:24 PM Shixiong(Ryan) Zhu <shixi...@databricks.com>
wrote:

> `onApplicationEnd` is posted when SparkContext is stopping, and you cannot
> submit any job to a stopping SparkContext. In general, SparkListener is
> used to monitor the job progress and collect job information, an you should
> not submit jobs there. Why not submit your jobs in the main thread?
>
> On Wed, Feb 17, 2016 at 7:11 AM, Sumona Routh <sumos...@gmail.com> wrote:
>
>> Can anyone provide some insight into the flow of SparkListeners,
>> specifically onApplicationEnd? I'm having issues with the SparkContext
>> being stopped before my final processing can complete.
>>
>> Thanks!
>> Sumona
>>
>> On Mon, Feb 15, 2016 at 8:59 AM Sumona Routh <sumos...@gmail.com> wrote:
>>
>>> Hi there,
>>> I am trying to implement a listener that performs as a post-processor
>>> which stores data about what was processed or erred. With this, I use an
>>> RDD that may or may not change during the course of the application.
>>>
>>> My thought was to use onApplicationEnd and then saveToCassandra call to
>>> persist this.
>>>
>>> From what I've gathered in my experiments,  onApplicationEnd  doesn't
>>> get called until sparkContext.stop() is called. If I don't call stop in my
>>> code, the listener won't be called. This works fine on my local tests -
>>> stop gets called, the listener is called and then persisted to the db, and
>>> everything works fine. However when I run this on our server,  the code in
>>> onApplicationEnd throws the following exception:
>>>
>>> Task serialization failed: java.lang.IllegalStateException: Cannot call
>>> methods on a stopped SparkContext
>>>
>>> What's the best way to resolve this? I can think of creating a new
>>> SparkContext in the listener (I think I have to turn on allowing multiple
>>> contexts, in case I try to create one before the other one is stopped). It
>>> seems odd but might be doable. Additionally, what if I were to simply add
>>> the code into my job in some sort of procedural block: doJob,
>>> doPostProcessing, does that guarantee postProcessing will occur after the
>>> other?
>>>
>>> We are currently using Spark 1.2 standalone at the moment.
>>>
>>> Please let me know if you require more details. Thanks for the
>>> assistance!
>>> Sumona
>>>
>>>
>

Re: SparkListener onApplicationEnd processing an RDD throws exception because of stopped SparkContext

Reply via email to