I use spot instances for 100 slaves cluster (r3.2xlarge on us-west-1)
Jobs I run usually take about 15 hours - cluster is stable and fast. 1-2
computers might be terminated but it's very rare event and Spark can handle
it.
On Fri, Mar 25, 2016 at 6:28 PM, Sven Krasser wrote:
> When a spot instan
When a spot instance terminates, you lose all data (RDD partitions) stored
in the executors that ran on that instance. Spark can recreate the
partitions from input data, but if that requires going through multiple
preceding shuffles a good chunk of the job will need to be redone.
-Sven
On Thu, Mar
I'm very new to apache spark. I'm just a user not a developer.
I'm running a cluster with many spot instances. Am I correct in
understanding that spark can handle an unlimited number of spot instance
failures and restarts? Sometimes all the spot instances will dissapear
without warning, and then