I use spot instances for 100 slaves cluster (r3.2xlarge on us-west-1)
Jobs I run usually take about 15 hours - cluster is stable and fast. 1-2
computers might be terminated but it's very rare event and Spark can handle
it.

On Fri, Mar 25, 2016 at 6:28 PM, Sven Krasser <kras...@gmail.com> wrote:

> When a spot instance terminates, you lose all data (RDD partitions) stored
> in the executors that ran on that instance. Spark can recreate the
> partitions from input data, but if that requires going through multiple
> preceding shuffles a good chunk of the job will need to be redone.
> -Sven
>
> On Thu, Mar 24, 2016 at 10:15 PM, Dillian Murphey <crackshotm...@gmail.com
> > wrote:
>
>> I'm very new to apache spark. I'm just a user not a developer.
>>
>> I'm running a cluster with many spot instances. Am I correct in
>> understanding that spark can handle an unlimited number of spot instance
>> failures and restarts?  Sometimes all the spot instances will dissapear
>> without warning, and then they come back.  Can I trust spark to pickup all
>> jobs where it left off?
>>
>> I'm noticing some instability with my system. I'm suspecting it could be
>> disk or RAM issues.  When I add a lot of slaves I run low on RAM on my
>> master.  Maybe that's part of the problem. But jut want to confirm my
>> understanding.
>>
>
>
>
> --
> www.skrasser.com <http://www.skrasser.com/?utm_source=sig>
>

Reply via email to