Agreed that the behavior of the Master killing off an Application when Executors from the same set of nodes repeatedly die is silly. This can also strike if a single node enters a state where any Executor created on it quickly dies (e.g., a block device becomes faulty). This prevents the Application from launching despite only one node being bad.
On Wed, Jul 9, 2014 at 3:08 PM, Mark Hamstra <m...@clearstorydata.com> wrote: > Actually, I'm thinking about re-purposing it. There's a nasty behavior > that I'll open a JIRA for soon, and that I'm thinking about addressing by > introducing/using another ExecutorState transition. The basic problem is > that Master can be overly aggressive in calling removeApplication on > ExecutorStateChanged. For example, say you have a working, long-running > Spark stand-alone-mode application and then try to add some more worker > nodes, but manage to misconfigure the new nodes so that on the new nodes > Executors never successfully start. In that scenario, you will repeatedly > end up in the !normalExit branch of Master's receive ExecutorStateChanged, > quickly exceed ApplicationState.MAX_NUM_RETRY (a non-configurable 10, which > is another irritation), and end up having your application killed off even > though it is still running successfully on the old worker nodes. > > > > On Wed, Jul 9, 2014 at 2:49 PM, Kay Ousterhout <k...@eecs.berkeley.edu> > wrote: > > > Git history to the rescue! It seems to have been added by Matei way back > > in July 2012: > > > > > https://github.com/apache/spark/commit/5d1a887bed8423bd6c25660910d18d91880e01fe > > > > and then was removed a few months later (replaced by RUNNING) by the same > > Mr. Zaharia: > > > > > https://github.com/apache/spark/commit/bb1bce79240da22c2677d9f8159683cdf73158c2#diff-776a630ac2b2ec5fe85c07ca20a58fc0 > > > > So I'd say it's safe to delete it. > > > > > > On Wed, Jul 9, 2014 at 2:36 PM, Mark Hamstra <m...@clearstorydata.com> > > wrote: > > > > > Doesn't look to me like this is used. Does anybody recall what it was > > > intended for? > > > > > >