Re: Spark resilience

2014-04-15 Thread Aaron Davidson
1. Spark prefers to run tasks where the data is, but it is able to move cached data between executors if no cores are available where the data is initially cached (which is often much faster than recomputing the data from scratch). The result is that data is automatically spread out across the clus

Re: Spark resilience

2014-04-15 Thread Arpit Tak
1. If we add more executors to cluster and data is already cached inside system(rdds are already there) . so, in that case those executors will run job on new executors or not , as rdd are not present there?? if yes, then how the performance on new executors ?? 2. What is the replication factor in

Re: Spark resilience

2014-04-15 Thread Manoj Samel
Thanks Aaron, this is useful ! - Manoj On Mon, Apr 14, 2014 at 8:12 PM, Aaron Davidson wrote: > Launching drivers inside the cluster was a feature added in 0.9, for > standalone cluster mode: > http://spark.apache.org/docs/latest/spark-standalone.html#launching-applications-inside-the-cluster

Re: Spark resilience

2014-04-14 Thread Aaron Davidson
Launching drivers inside the cluster was a feature added in 0.9, for standalone cluster mode: http://spark.apache.org/docs/latest/spark-standalone.html#launching-applications-inside-the-cluster Note the "supervise" flag, which will cause the driver to be restarted if it fails. This is a rather low

Re: Spark resilience

2014-04-14 Thread Manoj Samel
Could you please elaborate how drivers can be restarted automatically ? Thanks, On Mon, Apr 14, 2014 at 10:30 AM, Aaron Davidson wrote: > Master and slave are somewhat overloaded terms in the Spark ecosystem (see > the glossary: > http://spark.apache.org/docs/latest/cluster-overview.html#gloss

Re: Spark resilience

2014-04-14 Thread Ian Ferreira
Thanks Aaron. From: Aaron Davidson Reply-To: Date: Monday, April 14, 2014 at 10:30 AM To: Subject: Re: Spark resilience Master and slave are somewhat overloaded terms in the Spark ecosystem (see the glossary: http://spark.apache.org/docs/latest/cluster-overview.html#glossary). Are you

Re: Spark resilience

2014-04-14 Thread Aaron Davidson
Master and slave are somewhat overloaded terms in the Spark ecosystem (see the glossary: http://spark.apache.org/docs/latest/cluster-overview.html#glossary). Are you actually asking about the Spark "driver" and "executors", or the standalone cluster "master" and "workers"? To briefly answer for ei