Re: Hung spark executors don't count toward worker memory limit

Keith Simmons Mon, 13 Oct 2014 12:12:18 -0700

Maybe I should put this another way.  If spark has two jobs, A and B, both
of which consume the entire allocated memory pool, is it expected that
spark can launch B before the executor processes tied to A are completely
terminated?


On Thu, Oct 9, 2014 at 6:57 PM, Keith Simmons <ke...@pulse.io> wrote:

> Actually, it looks like even when the job shuts down cleanly, there can be
> a few minutes of overlap between the time the next job launches and the
> first job actually terminates it's process.  Here's some relevant lines
> from my log:
>
> 14/10/09 20:49:20 INFO Worker: Asked to kill executor
> app-20141009204127-0029/1
> 14/10/09 20:49:20 INFO ExecutorRunner: Runner thread for executor
> app-20141009204127-0029/1 interrupted
> 14/10/09 20:49:20 INFO ExecutorRunner: Killing process!
> 14/10/09 20:49:20 INFO Worker: Asked to launch executor
> app-20141009204508-0030/1 for Job
> ... More lines about launching new job...
> 14/10/09 20:51:17 INFO Worker: Executor app-20141009204127-0029/1 finished
> with state KILLED
>
> As you can see, the first app didn't actually shutdown until two minutes
> after the new job launched.  During that time, I was at double the worker
> memory limit.
>
> Keith
>
>
> On Thu, Oct 9, 2014 at 5:06 PM, Keith Simmons <ke...@pulse.io> wrote:
>
>> Hi Folks,
>>
>> We have a spark job that is occasionally running out of memory and
>> hanging (I believe in GC).  This is it's own issue we're debugging, but in
>> the meantime, there's another unfortunate side effect.  When the job is
>> killed (most often because of GC errors), each worker attempts to kill its
>> respective executor.  However, it appears that several of the executors
>> fail to shut themselves down (I actually have to kill -9 them).  However,
>> even though the worker fails to successfully cleanup the executor, it
>> starts the next job as though all the resources have been freed up.  This
>> is causing the spark worker to exceed it's configured memory limit, which
>> is in turn running our boxes out of memory.  Is there a setting I can
>> configure to prevent this issue?  Perhaps by having the worker force kill
>> the executor or not start the next job until it's confirmed the executor
>> has exited?  Let me know if there's any additional information I can
>> provide.
>>
>> Keith
>>
>> P.S. We're running spark 1.0.2
>>
>
>

Re: Hung spark executors don't count toward worker memory limit

Reply via email to