Maybe I should put this another way. If spark has two jobs, A and B, both of which consume the entire allocated memory pool, is it expected that spark can launch B before the executor processes tied to A are completely terminated?
On Thu, Oct 9, 2014 at 6:57 PM, Keith Simmons <ke...@pulse.io> wrote: > Actually, it looks like even when the job shuts down cleanly, there can be > a few minutes of overlap between the time the next job launches and the > first job actually terminates it's process. Here's some relevant lines > from my log: > > 14/10/09 20:49:20 INFO Worker: Asked to kill executor > app-20141009204127-0029/1 > 14/10/09 20:49:20 INFO ExecutorRunner: Runner thread for executor > app-20141009204127-0029/1 interrupted > 14/10/09 20:49:20 INFO ExecutorRunner: Killing process! > 14/10/09 20:49:20 INFO Worker: Asked to launch executor > app-20141009204508-0030/1 for Job > ... More lines about launching new job... > 14/10/09 20:51:17 INFO Worker: Executor app-20141009204127-0029/1 finished > with state KILLED > > As you can see, the first app didn't actually shutdown until two minutes > after the new job launched. During that time, I was at double the worker > memory limit. > > Keith > > > On Thu, Oct 9, 2014 at 5:06 PM, Keith Simmons <ke...@pulse.io> wrote: > >> Hi Folks, >> >> We have a spark job that is occasionally running out of memory and >> hanging (I believe in GC). This is it's own issue we're debugging, but in >> the meantime, there's another unfortunate side effect. When the job is >> killed (most often because of GC errors), each worker attempts to kill its >> respective executor. However, it appears that several of the executors >> fail to shut themselves down (I actually have to kill -9 them). However, >> even though the worker fails to successfully cleanup the executor, it >> starts the next job as though all the resources have been freed up. This >> is causing the spark worker to exceed it's configured memory limit, which >> is in turn running our boxes out of memory. Is there a setting I can >> configure to prevent this issue? Perhaps by having the worker force kill >> the executor or not start the next job until it's confirmed the executor >> has exited? Let me know if there's any additional information I can >> provide. >> >> Keith >> >> P.S. We're running spark 1.0.2 >> > >